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Preface 


Overview 


In formulating a stochastic model to describe a real phenomenon, it used to be that 
one compromised between choosing a model that is a realistic replica of the actual 
situation and choosing one whose mathematical analysis is tractable. That is, 
there did not seem to be any payoff in choosing a model that faithfully conformed ` 
to the phenomenon under study if it were not possible to mathematically analyze 
that model. Similar considerations have led to the concentration on asymptotic 
or steady-state results as opposed to the more useful ones on transient time. 
However, the relatively recent advent of fast and inexpensive computational 
power has opened up another approach—namely, to try to model the phenomenon 
as faithfully as possible and then to rely on a simulation study to analyze it. 

In this text we show how to analyze a model by use of a simulation study. 
In particular, we first show how a computer can be utilized to generate random 
(more precisely, pseudorandom) numbers, and then how these random numbers 
can be used to generate the values of random variables from arbitrary distribu- 


_ tions. Using the concept of discrete events we show how to use random variables 


to generate the behavior of a stochastic model over time. By continually gener- 
ating the behavior of the system we show how to obtain estimators of desired 
quantities of interest. The statistical questions of when to stop a simulation and 
what confidence to place in the resulting estimators are considered. A variety of 
ways in which one can improve on the usual simulation estimators are presented. 
Tn addition, we show how to use simulation to determine whether the stochastic 
model chosen is consistent with a set of actual data. 


New to This Edition 
New exercises in most chapters. 


New results on generating a sequence of Bernoulli random variables 
(Example 4e). 
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New results on the optimal use of the exponential to generate gamma random 
variables by the rejection method (Section 5.2). 

A new example (8p) related to finding the distribution of the number of healthy 
cells that survive when all cancer cells are to be killed. 

A rewritten section (8.4) on stratified sampling, including material on poststrat- 
ification and additional material on finding the optimal number of simulation 
runs in each strata. 

A new section (8.5) on applications of stratified sampling to analysis of systems 
having Poisson arrivals (8.5.1), to computation of multidimensional integrals 
of monotone functions (8.5.2), and to compound random vectors (8.5.3). 

A new section (8.9) on variance reduction techniques useful when computing 
functions of random permutations and random subsets. 


Chapter Descriptions 


The successive chapters in this text are as follows. Chapter 1 is an introduc- 
tory chapter which presents a typical phenomenon that is of interest to study. 
Chapter 2 is a review of probability. Whereas this chapter is self-contained and 
does not assume the reader is familiar with probability, we imagine that it will 
indeed be a review for most readers. Chapter 3 deals with random numbers 
and how a variant of them (the so-called pseudorandom numbers) can be gen- 
erated on a computer. The use of random numbers to generate discrete and then 
continuous random variables is considered in Chapters 4 and 5. 

Chapter 6 presents the discrete event approach to track an arbitrary system as 
it evolves over time. A variety of examples—telating to both single and multiple 
server queueing systems, to an insurance risk model, to an inventory system, to 
a machine repair model, and to the exercising of a stock option—are presented. 
Chapter 7 introduces the subject matter of statistics. Assuming that our average 
reader has not previously studied this subject, the chapter starts with very basic 
concepts and ends by introducing the bootstrap statistical method, which is quite 
useful in analyzing the results of a simulation. 

Chapter 8 deals with the important subject of variance reduction. This is an 
attempt to improve on the usual simulation estimators by finding ones having 
the same mean and smaller variances. The chapter begins by introducing the 
technique of using antithetic variables. We note (with a proof deferred to the 
chapter’s appendix) that this always results in a variance reduction along with 
a computational savings when we are trying to estimate the expected value of 
a function that is monotone in each of its variables. We then introduce control 
variables and illustrate their usefulness in variance reduction. For instance, we 
show how control variables can be effectively utilized in analyzing queueing 
systems, reliability systems, a list reordering problem, and blackjack. We also 
indicate how to use regression packages to facilitate the resulting computations 
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when using control variables. Variance reduction by use of conditional expecta- 
tions is then considered. Its use is indicated in examples dealing with estimating 
a, and in analyzing finite capacity queueing systems. Also, in conjunction with a 
control variate, conditional expectation is used to estimate the expected number 
of events of a renewal process by some fixed time. The use of stratified sampling 
as a variance reduction tool is indicated in examples dealing with queues with 


- varying arrival rates and evaluating integrals. The relationship between the vari- 


ance reduction techniques of conditional expectation and stratified sampling is 
explained and illustrated in the estimation of the expected return in video poker. 
Applications of stratified sampling to queueing systems having Poisson arrivals, 
to computation of multidimensional integrals, and to compound random vectors 
are also given. The technique of importance sampling is next considered. We 
indicate and explain how this can be an extremely powerful variance reduction 
technique when estimating small probabilities. In doing so, we introduce the 
concept of tilted distributions and show how they can be utilized in an impor- 
tance sampling estimation of a small convolution tail probability. Applications 
of importance sampling to queueing, random walks, and random permutations, 
and to computing conditional expectations when one is conditioning on a rare | 
event are presented. The final variance reduction technique of Chapter 8 relates 
to the use of a common stream of random numbers. An application to valuing an 
exotic stock option that utilizes a combination of variance reduction techniques 
is presented in Section 8.7. 

Chapter 9 is concerned with statistical validation techniques, which are sta- 
tistical procedures that can be used to validate the stochastic model when some 
real data are available. Goodness of fit tests such as the chi-square test and the 
Kolmogorov—Smimov test are presented. Other sections in this chapter deal with 
the two-sample and the n-sample problems and with ways of statistically testing 
the hypothesis that a given process is a Poisson process. 

Chapter 10 is concerned with Markov chain Monte Carlo methods. These 


. are techniques that have greatly expanded the use of simulation in recent years. 


The standard simulation paradigm for estimating 0 = E[h(X)], where X is a 
random vector, is to simulate independent and identically distributed copies of 
X and then use the average value of A(X) as the estimator. This is the so-called 
“raw” simulation estimator, which can then possibly be improved upon by using 
one or more of the variance reduction ideas of Chapter 8. However, in order to 
employ this approach it is necessary both that the distribution of X be specified 
and also that we be able to simulate from this distribution. Yet, as we see in 
Chapter 10, there are many examples where the distribution of X is known but 
we are not able to directly simulate the random vector X, and other examples 
where the distribution is not completely known but is only specified up to a 
multiplicative constant. Thus, in either case, the usual approach to estimating 
@ is not available. However, a new approach, based on generating a Markov 
chain whose limiting distribution is the distribution of X, and estimating @ by 
the average of the values of the function A evaluated at the successive states 
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of this chain, has become widely used in recent years. These Markov chain 
Monte Carlo methods are explored in Chapter 10. We start, in Section 10.2, by 
introducing and presenting some of the properties of Markov chains. A general 
technique for generating a Markov chain having a limiting distribution that is 
specified up to a multiplicative constant, known as the Hastings-Metropolis 
algorithm, is presented in Section 10.3, and an application to generating a random 
element of a large “combinatorial” set is given. The most widely used version 
of the Hastings—Metropolis algorithm is known as the Gibbs sampler, and this is 
presented in Section 10.4. Examples are discussed relating to such problems as 
generating random points in a region subject to a constraint that no pair of points 
are within a fixed distance of each other, to analyzing product form queueing 
networks, to analyzing a hierarchical Bayesian statistical model for predicting 
the numbers of home runs that will be hit by certain baseball players, and to 
simulating a multinomial vector conditional on the event that all outcomes occur 
at least once. An application of the methods of this chapter to deterministic 
optimization problems, called simulated annealing, is presented in Section 10.5, 
and an example concerning the traveling salesman problem is presented. The final 
section of Chapter 10 deals with the sampling importance resampling algorithm, 
which is a generalization of the acceptance—rejection technique of Chapters 4 
and 5. The use of this algorithm in Bayesian statistics is indicated. 

Chapter 11 deals with some additional topics in simulation. In Section 11.1 
we learn of the alias method which, at the cost of some setup time, is a very 
efficient way to generate discrete random variables. Section 11.2 is concerned 
with simulating a two- dimensional Poisson process. In Section 11.3 we present 
an identity concerning the covariance of the sum of dependent Bernoulli random 
variables and show how its use can result in estimators of small probabilities 
having very low variances. Applications relating to estimating the reliability of a 
system, which appears to be more efficient that any other known estimator of a 
small system reliability, and to estimating the probability that a specified pattern 
occurs by some fixed time, are given. Section 11.4 presents an efficient technique 
to employ simulation to estimate first passage time means and distributions of a 
Markov chain. An application to computing the tail probabilities of a bivariate 
normal random variable is given. Section 11.5 presents the coupling from the 
past approach to simulating a random variable whose distribution is that of the 
stationary distribution of a specified Markov chain. 
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Introduction 


Consider the following situation faced by a pharmacist who is thinking of setting 
up a small pharmacy where he will fill prescriptions. He plans on opening up 


at 9 A.M. every weekday and expects that, on average, there will be about 32 l 


prescriptions called in daily before 5 P.M. His experience indicates that the time 
that it will take him to fill a prescription, once he begins working on it, is a 
random quantity having a mean and standard deviation of 10 and 4 minutes, 
respectively. He plans on accepting no new prescriptions after 5 P.M., although 
he will remain in the shop past this time if necessary to fill all the prescriptions 
ordered that day. Given this scenario the pharmacist is probably, among other 
things, interested in the answers to the following questions: 


ns 


1. What is the average time that he will depart his store at night? 
2. 
3. What is the average time it will take him to fill a prescription (taking into 


What proportion of days will he still be working at 5:30 P.M.? 


account that he cannot begin working on a newly arrived prescription until 
all earlier arriving ones have been filled)? 


. What proportion of prescriptions will be filled within 30 minutes? 
. If he changes his policy on accepting all prescriptions between 9 A.M. and 


5 P.M., but rather only accepts new ones when there are fewer than five 
prescriptions still needing to be filled, how many prescriptions, on average, 
will be lost? 


. How would the conditions of limiting orders affect the answers to questions 1 


through 4? 


In order to employ mathematics to analyze this situation and answer the 
questions, we first construct a probability model. To do this it is necessary to 
make some reasonably accurate assumptions concerning the preceding scenario. 
For instance, we must make some assumptions about the probabilistic mechanism 


1 


2 1 introduction 


that describes the arrivals of the daily average of 32 customers. One possible 
assumption might be that the arrival rate is, in a probabilistic sense, constant over 
the day, whereas a second (probably more realistic) possible assumption is that 
the arrival rate depends on the time of day. We must then specify a probability 
distribution (having mean 10 and standard deviation 4) for the time it takes to 
service a prescription, and we must make assumptions about whether or not the 
service time of a given prescription always has this distribution or whether it 
changes as a function of other variables (e.g., the number of waiting prescriptions 
to be filled or the time of day). That is, we must make probabilistic assumptions 
about the daily arrival and service times. We must also decide if the probability 
law describing a given day changes as a function of the day of the week or 
whether it remains basically constant over time. After these assumptions, and 
possibly others, have been specified, a probability model of our scenario will 
have been constructed. 

Once a probability model has been constructed, the answers to the questions 
can, in theory, be analytically determined. However, in practice, these questions 
are much too difficult to determine analytically, and so to answer them we usually 
have to perform a simulation study. Such a study programs the probabilistic 
mechanism on a computer, and by utilizing “random numbers” it simulates 
possible occurrences from this model over a large number of days and then 
utilizes the theory of statistics to estimate the answers to questions such as those 
given. In other words, the computer program utilizes random numbers to generate 
the values of random variables having the assumed probability distributions, 
which represent the arrival times and the service times of prescriptions. Using 
these values, it determines over many days the quantities of interest related to the 
questions. It then uses statistical techniques to provide estimated answers—for 
example, if out of 1000 simulated days there are 122 in which the pharmacist is 
still working at 5:30, we would estimate that the answer to question 2 is 0.122. 

In order to be able to execute such an analysis, one must have some knowledge 
of probability so as to decide on certain probability distributions and questions 
such as whether appropriate random variables are to be assumed independent or 
not. A review of probability is provided in Chapter 2. The bases of a simulation 
study are so-called random numbers. A discussion of these quantities and how 
they are computer generated is presented in Chapter 3. Chapters 4 and 5 show 
how one can use random numbers to generate the values of random variables 
having arbitrary distributions. Discrete distributions are considered in Chapter 4 
and continuous ones in Chapter 5. After completing Chapter 5, the reader should 
have some insight into the construction of a probability model for a given system 
and also how to use random numbers to generate the values of random quantities 
related to this model. The use of these generated values to track the system as it 
evolves continuously over time—that is, the actual simulation of the system— 
is discussed in Chapter 6, where we present the concept of “discrete events” 
and indicate how to utilize these entities to obtain a systematic approach to 
simulating systems. The discrete event simulation approach leads to a computer 


Exercises 3 


program, which can be written in whatever language the reader is comfortable 
in, that simulates the system a large number of times. Some hints concerning 
the verification of this program—to ascertain that it is actually doing what is 
desired—are also given in Chapter 6. The use of the outputs of a simulation 
study to answer probabilistic questions concerning the model necessitates the 
use of the theory of statistics, and this subject is introduced in Chapter 7. This 


_ chapter starts with the simplest and most basic concepts in statistics and continues 


toward the recent innovation of “bootstrap statistics,” which is quite useful in 
simulation. Our study of statistics indicates the importance of the variance of the 
estimators obtained from a simulation study as an indication of the efficiency 
of the simulation. In particular, the smaller this variance is, the smaller is the 
amount of simulation needed to obtain a fixed precision. As a result we are 
led, in Chapter 8, to ways of obtaining new estimators that are improvements 
over the raw simulation estimators because they have reduced variances. This 
topic of variance reduction is extremely important in a simulation study because 
it can substantially improve its efficiency. Chapter 9 shows how one can use 
the results of a simulation to verify, when some real-life data are available, 
the appropriateness of the probability model (which we have simulated) to the 
real-world situation. Chapter 10 introduces the important topic of Markov chain - 
Monte Carlo methods. The use of these methods has, in recent years, greatly 
expanded the class of problems that can be attacked by simulation. Chapter 11 
considers a variety of additional topics. 


Exercises 


1. The following data yield the arrival times and service times that each cus- 
tomer will require, for the first 13 customers at a single server system. Upon 
arrival, a customer either enters service if the server is free or joins the waiting 
line. When the server completes work on a customer, the next one in line (i.e., 
the one who has been waiting the longest) enters service. 


Arrival Times: 12 31 63 95 99 154 198 221 304 346 411 455 537 
Service Times: 40 32 55 48 18 50 47 18 28 54 40 72 12 


(a) Determine the departure times of these 13 customers. 

(b) Repeat (a) when there are two servers and a customer can be served by 
either one. 

(c) Repeat (a) under the new assumption that when the server completes 
a service, the next customer to enter service is the one who has been 
waiting the least time. 


2. Consider a service station where customers arrive and are served in their 
order of arrival. Let A,, S,, and D, denote, respectively, the arrival time, the 
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service time, and the departure time of customer n. Suppose there is a single 
server and that the system is initially empty of customers. 


(a) With D, = 0, argue that for n > 0 
D,, — S, = Maximum {A,, D,_;} 


(b) Determine the corresponding recursion formula when there are two 
servers. 

(c) Determine the corresponding recursion formula when there are k servers. 

(d) Write a computer program to determine the departure times as a function 
of the arrival and service times and use it to check your answers in parts 
(a) and (b) of Exercise 1. 


Elements of Probability 


2.1 Sample Space and Events 


Consider an experiment whose outcome is not known in advance. Let S, called 
the sample space of the experiment, denote the set of all possible outcomes. For 
example, if the experiment consists of the running of a race among the seven 
horses numbered 1 through 7, then 


S = {all orderings of (1, 2,3, 4,5, 6, 7)} 


The outcome (3, 4, 1, 7, 6, 5, 2) means, for example, that the number 3 horse 
came in first, the number 4 horse came in second, and so on. 

Any subset A of the sample space is known as an event. That is, an event is 
a set consisting of possible outcomes of the experiment. If the outcome of the 
experiment is contained in A, we say that A has occurred. For example, in the 
above, if 


A = {all outcomes in S starting with 5} 


then A is the event that the number 5 horse comes in first. 

For any two events A and B we define the new event AUB, called the union 
of A and B, to consist of all outcomes that are either in A or B or in both A 
and B. Similarly, we define the event AB, called the intersection of A and B, to 
consist of all outcomes that are in both A and B. That is, the event AU B occurs 
if either A or B occurs, whereas the event AB occurs if both A and B occur. We 
can also define unions and intersections of more than two events. In particular, 
the union of the events A,,..., A, —designated by U?_, A;j—is defined to consist 
of all outcomes that are in any of the A;. Similarly, the intersection of the events 
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A,,...,A,—designated by A,A,---A,—is defined to consist of all outcomes 
that are in all of the A;. 

For any event A we define the event A‘, referred to as the complement of A, 
to consist of all outcomes in the sample space S that are not in A. That is, A‘ 
occurs if and only if A does not. Since the outcome of the experiment must lie 
in the sample space S, it follows that S° does not contain any outcomes and thus 
cannot occur. We call $° the null set and designate it by ø. If AB = ø so that A 
and B cannot both occur (since there are no outcomes that are in both A and B), 
we say that A and B are mutually exclusive. 


2.2 Axioms of Probability 


Suppose that for each event A of an experiment having sample space S there is 
a number, denoted by P(A) and called the probability of the event A, which is 
in accord with the following three axioms: 


Axiom 1 0 < P(A) <1 
Axiom 2 P(S)=1 


Axiom 3 For any sequence of mutually exclusive events A,, Ay,... 


(Ua) =F P(A), n=1,2,...,00 
i=l 


i=l 


, Thus, Axiom 1 states that the probability that the outcome of the experiment 
lies within A is some number between 0 and 1; Axiom 2 states that with 
probability 1 this outcome is a member of the sample space; and Axiom 3 states 
that for any set of mutually exclusive events, the probability that at least one of 
these events occurs is equal to the sum of their respective probabilities. 

These three axioms can be used to prove a variety of results about probabilities. 
For instance, since A and A° are always mutually exclusive, and since AU A® = S 
we have from Axioms 2 and 3 that 


1 = P(S) = P(AUA‘) = P(A) + P(A‘) 
or equivalently 
P(A‘) =1— P(A) 


In words, the probability that an event does not occur is 1 minus the probability 
that it does. l 


2.3 Conditional Probability and Independence 7 


2.3. Conditional Probability and Independence 


Consider an experiment that consists of flipping a coin twice, noting each time 
whether the result was heads or tails. The sample space of this experiment can 
be taken to be the following set of four outcomes: 


S = {(H, E), (H, T), (T, H), (T, T)} 


where (H, T) means, for example, that the first flip lands heads and the second 
tails. Suppose now that each of the four possible outcomes is equally likely 
to occur and thus has probability L, Suppose further that we observe that the 
first flip lands on heads. Then, given this information, what is the probability 
that both flips land on heads? To calculate this probability we reason as fol- 
lows: Given that the initial flip lands heads, there can be at most two possible 
outcomes of our experiment, namely, (H, H) or (H, T). In addition, as each of 
these outcomes originally had the same probability of occurring, they should 
still have equal probabilities. That is, given that the first flip lands heads, the 
(conditional) probability of each of the outcomes (H, H) and (H, T) is 3, whereas | 
the (conditional) probability of the other two outcomes is 0. Hence the desired 
probability is }. 

If we let A and B denote, respectively, the event that both flips land on heads 
and the event that the first flip lands on heads, then the probability obtained 
above is called the conditional probability of A given that B has occurred and is 
denoted by 


P(A|B) 


A general formula for P(A|B) that is valid for all experiments and events A and 
B can be obtained in the same manner as given previously. Namely, if the event 


- B occurs, then in order for A to occur it is necessary that the actual occurrence 


be a point in both A and B; that is, it must be in AB. Now since we know that 
B has occurred, it follows that B becomes our new sample space and hence the 
probability that the event AB occurs will equal the probability of AB relative to 
the probability of B. That is, 


P(AB) 
P(A|B) = —— 
(AIB) = Som 
The determination of the probability that some event A occurs is often simpli- 
fied by considering a second event B and then determining both the conditional 
probability of A given that B occurs and the conditional probability of A given 
that B does not occur. To do this, note first that 


A= ABU AB® 


8 2 Elements of Probability 
Because AB and AB‘ are mutually exclusive, the preceding yields 


P(A) = P(AB) + P(AB‘) 
= P(A|B)P(B) + P(A|B°) P(B*) 


When we utilize the preceding formula, we say that we are computing P(A) by 
conditioning on whether or not B occurs. 


Example 2a An insurance company classifies its policy holders as being 
either accident prone or not. Their data indicate that an accident prone person will 
file a claim within a one-year period with probability .25, with this probability 
falling to .10 for a non accident prone person. If a new policy holder is accident 


prone with probability .4, what is the probability he or she will file a claim 
within a year? 


Solution Let C be the event that a claim will be filed, and let B be the event 
that the policy holder is accident prone. Then 


P(C) = P(C|B)P(B) + P(C|B‘) P(B‘) = (.25)(.4) + (.10)(.6) = .16 


0 
Suppose that exactly one of the events B,,i=1,...,n must occur. That is, 
suppose that B,, B,,...,B, are mutually exclusive events whose union is the 


sample space S. Then we can also compute the probability of an event A by 


ite on which of the B; occur. The formula for this is obtained by using 
at 


A=AS = A(Uz, B;) = Uz AB; 
which implies that 
P(A) =}_ P(AB;) 
i=] 
=} P(A[B,) P(B) 
i=l 


Example 2b Suppose there are k types of coupons, and that each new one 
collected is, independent of previous ones, a type j coupon with probability 
Pj» Loja: Pj = 1. Find the probability that the n" coupon collected is a different 
type than any of the preceding n—1. 
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Solution Let N be the event that coupon n is a new type. To compute P(N), 
condition on which type of coupon it is. That is, with 7; being the event that 
coupon 7 is a type j coupon, we have 


k 
P(N) =}_ P(NIT;)P(T;) 
j=l 


k 
=} (1 a pp; 


j=l 


where P(N|T;) was computed by noting that the conditional probability that 
coupon n is a new type given that it is a type j coupon is equal to the conditional 
probability that each of the first n — 1 coupons is not a type j coupon, which by 
independence is equal to (1—p,)""". m 


As indicated by the coin flip example, P(A|B), the conditional probability 
of A, given that B occurred, is not generally equal to P(A), the unconditional 


probability of A. In other words, knowing that B has occurred generally changes . 


the probability that A occurs (what if they were mutually exclusive?). In the 
special case where P(A|B) is equal to P(A), we say that A and B are independent. 
Since P(A|B) = P(AB)/P(B), we see that A is independent of B if 


P(AB) = P(A)P(B) 


Since this relation is symmetric in A and B, it follows that whenever A is 
independent of B, B is independent of A. 


2.4 Random Variables 


When an experiment is performed we are sometimes primarily concerned about 
the value of some numerical quantity determined by the result. These quantities 
of interest that are determined by the results of the experiment are known as 
random variables. 

The cumulative distribution function, or more simply the distribution function, 
F of the random variable X is defined for any real number x by 


F(x) = P{X < x} 


A random variable that can take either a finite or at most a countable number 
of possible values is said to be discrete. For a discrete random variable X we 
define its probability mass function p(x) by 


p(x) = P(X =2} 
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If X is a discrete random variable that takes on one of the possible values 
X1» X2, . . . , then, since X must take on one of these values, we have 


Speal 


i=l 


Example 2a Suppose that X takes on one of the values 1, 2, or 3. If 


p=3, =i 


then, since p(1) + p(2) + p(3) = 1, it follows that p(3) = =. 

Whereas a discrete random variable assumes at most a countable set of possible 
values, we often have to consider random variables whose set of possible values 
is an interval. We say that the random variable X is a continuous random variable 
if there is a nonnegative function f(x) defined for all real numbers x and having 
the property that for any set C of real numbers 


(mi 


P{XeC}= Í F(x) dx (2.1) 


The function f is called the probability density function of the random variable X. 
The relationship between the cumulative distribution F(-) and the probability 
density f(-) is expressed by 


F(a) = P(X € (~œ, al} = f E 


Differentiating both sides yields 


Fla) = fla) 


That is, the density is the derivative of the cumulative distribution function. A 
somewhat more intuitive interpretation of the density function may be obtained 
from Equation (2.1) as follows: 


€ E a+e/2 
P{a-= <X<atsb=[ f)dx~ef(a) 
2 2 a—eé/2 

when € is small. In other words, the probability that X will be contained in an 
interval of length € around the point a is approximately ef(a). From this, we 
see that f(a) is a measure of how likely it is that the random variable will be 
near a. f 

In many experiments we are interested not only in probability distribution 
functions of individual random variables, but also in the relationships between 
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two or more of them. In order to specify the relationship between two random 
variables, we define the joint cumulative probability distribution function of X 
and Y by 


F(x,y)=P{X<x, Y<y} 


. Thus, F(x, y) specifies the probability that X is less than or equal to x and 


simultaneously Y is less than or equal to y. 
If X and Y are both discrete random variables, then we define the joint 
probability mass function of X and Y by 


p(x, y) =P{X =x, Y=} 


Similarly, we say that X and Y are jointly continuous, with joint probability 
density function f(x, y), if for any sets of real numbers C and D 


P{X e C, Y € D}= |f Kæ y)dxdy 


xec 
yeD 
The random variables X and Y are said to be independent if for any two sets 
of real numbers C and D 


P{X €C, Y € D} = P{X e€ C}P{Y € D} 


That is, X and Y are independent if for all sets C and D the events A = {X € C} 
and B = {Y € D} are independent. Loosely speaking, X and Y are independent if 
knowing the value of one of them does not affect the probability distribution of 
the other. Random variables that are not independent are said to be dependent. 
Using the axioms of probability, we can show that the discrete random vari- 


_ ables X and Y will be independent if and only if, for all x, y, 


P{X =x, Y =y} = P{X = x}P{Y = y} 


Similarly, if X and Y are jointly continuous with density function f(x, y), then 
they will be independent if and only if, for all x, y, 


Fy) = fx) fr O) 


where fy(x) and fy(y) are the density functions of X and Y, respectively. 


2.5 Expectation 


One of the most useful concepts in probability is that of the expectation of a 
random variable. If X is a discrete random variable that takes on one of the 
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possible values x,,x,,..., then the expectation or expected value of X, also 
called the mean of X and denoted by E[X], is defined by 
E[X] =} x,P{X =x} (2.2) 
i 


In words, the expected value of X is a weighted average of the possible values 
that X can take on, each value being weighted by the probability that X assumes 
it. For example, if the probability mass function of X is given by 


pO) =5 =P) 


1 1 1 
E[X] =0{ >= ~)j=- 
m=o(3)+ (3) =3 
is just the ordinary average of the two possible values 0 and 1 that X can assume. 
On the other hand, if 


then 


PO=5 p0 


ax =0(3)+ (4) -5 


is a weighted average of the two possible values 0 and 1 where the value 1 is 
given twice as much weight as the value 0 since p(1) = 2p(0). 


then 


Example 2b _ If J is an indicator random variable for the event A, that is, if 


I= 1 if A occurs 
~ 10 if A does not occur 


then 
E[N = 1P(A) + OP(A‘) = P(A) 


Hence, the expectation of the indicator random variable for the event A is just 
the probability that A occurs. o 


If X is a continuous random variable having probability density function f, 
then, analogous to Equation (2.2), we define the expected value of X by 


E[X]= i xf(x) dx 
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Example 2c Ifthe probability density function of X is given by 


3x7 ifO<x<1 ! 
0 otherwise 


w| 


.then 


1 3 
EX] = f 3x°dx = i o 


Suppose now that we wanted to determine the expected value not of the 
random variable X but of the random variable g(X), where g is some given 
function. Since g(X) takes on the value g(x) when X takes on the value x, it 
seems intuitive that E[g(X)] should be a weighted average of the possible values 
g(x) with, for a given x, the weight given to g(x) being equal to the probability 
(or probability density in the continuous case) that X will equal x. Indeed, the 
preceding can be shown to be true and we thus have the following result. 


Proposition Jf X is a discrete random variable having probability mass 
function p(x), then 


E[g(X)] = J 8(2)P@) 
whereas if X is continuous with probability density function f(x), then 


FeO = fT s)flaax 
A consequence of the above proposition is the following. 
Corollory Ifa and b are constants, then 
E[aX +b] = aE[X] +b 
Proof In the discrete case 
E[aX +b] =}_(ax+b)p(x) 
= ad xp(x) +E pls) 
= ak[X]+b 


Since the proof in the continuous case is similar, the result is established. o 
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It can be shown that expectation is a linear operation in the sense that for any 
two random variables X, and X, 


E[X, +X,]= E[X,] +£E[X,] 
which easily generalizes to give 
e [3x] = Sate 


i=1 i=l 


2.6 Variance 


Whereas E[X], the expected value of the random variable X, is a weighted 
average of the possible values of X, it yields no information about the variation 
of these values. One way of measuring this variation is to consider the average 
value of the square of the difference between X and E[X]. We are thus led to 
the following definition. 


Definition If X is a random variable with mean p, then the variance of X, 
denoted by Var(X), is defined by 


Var(X) = E(X — u°] 
An alternative formula for Var(X) is derived as follows: 
Var(X) = E[(X — w)’] 
= E[X? —2uX + p?] 
= E[X?] — E[2uX]+ Ely?) 
= E[X?]—2wE[X]+ p? 
= E[X?]— p? 
That is, 
Var(X) = E[X*] — (EXI 


A useful identity, whose proof is left as an exercise, is that for any constants 
a and b 


Var(aX + b) = a’Var(X) 
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Whereas the expected value of a sum of random variables is equal to the sum 
of the expectations, the corresponding result is not, in general, true for variances. 
It is, however, true in the important special case where the random variables 
are independent. Before proving this let us define the concept of the covariance 
between two random variables. 


-Definition The covariance of two random variables X and Y, denoted 


Cov(X, Y), is defined by 
Cov(X, Y) = E[(X — H)(¥ — py)] 
where y, = E[X] and p, = E[Y]. 


A useful expression for Cov(X, Y) is obtained by expanding the right side 
of the above equation and then making use of the linearity of expectation. This 
yields 


Cov(X, Y) = E[XY — u, Y — Xu, + Hh] 
= E[XY|— E[X]E[Y] (2.3) 


We now derive an expression for Var(X + Y) in terms of their individual 
variances and the covariance between them. Since 


F[X +Y] = F[X]+ ELY] =m. + my 
we see that 
Var(X+¥) = E[(X+¥ —,-m,)'] 
= E[(X—p,)’ + (¥ — My)? +2(X — u.) (Y — u,)] 
= E[(X — p) ] + ELY — u) l] + 2E[(X — e) (Y — By) 
= Var(X) + Var(Y) +2Cov(X, Y) (2.4) 


We end this section by showing that the variance of the sum of independent 
random variables is equal to the sum of their variances. 


Proposition JfX and Y are independent random variables then 


Cov(X, Y) =0 
and so, from Equation (2.4), 
Var(X +Y) = Var(X) + Var(Y) 
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Proof From Equation (2.3) it follows that we need to show that E[XY| = 
E[X]E[Y]. Now in the discrete case, 


E[XY] = dda P(X =x, Y= y;} 
j i 
= 2, dimyjP{X = x;}P{Y = y;} by independence 
j i 
= LY PY = yj} LmP{X = x;} 
= E[YE[X] 
Since a similar argument holds in the continuous case, the result is proved. O 


The correlation between two random variables X and Y, denoted as 
Corr(X, Y), is defined by 


Cov(X, Y) 


J Var(X)Var(Y) 


Corr(X, Y) = 


2.7 Chebyshev’s Inequality and the Laws of Large Numbers 
We start with a result known as Markov’s inequality. 


Proposition Markov’s Inequality If X takes on only nonnegative 
values, then for any value a > 0 


P{X > a} < E[X] 
a 
Proof Define the random variable Y by 


Y= a, if X >a 
~ 0, if X <a 


Because X > 0, it easily follows that 
X>Y 
Taking expectations of the preceding inequality yields 
E[X] > E[Y] = aP{X > a} 


and the result is proved. o 
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As acorollary we have Chebyshev’s inequality, which states that the probability 
that a random variable differs from its mean by more than k of its standard 
deviations is bounded by 1/k?, where the standard deviation of a random variable 
is defined to be the square root of its variance. 


Corollory Chebyshev’s Inequality If X is a random variable having 


-mean p and variance o°, then for any value k > 0, 


1 
P{|X -u| 2 ko} < z3 


Proof Since (X —4)/o” is a nonnegative random variable whose mean is 


Eee w 


we obtain from Markov’s inequality that 
(X = BY? 2 1 
heer ee eS 
The result now follows since the inequality (X — )"/o? > k? is equivalent to 
the inequality |X — u| > ko. o 


We now use Chebyshev’s inequality to prove the weak law of large numbers, 
which states that the probability that the average of the first n terms of a sequence 
of independent and identically distributed random variables differs from its mean 
by more than € goes to 0 as n goes to infinity. 


Theorem The Weak Law of Large Numbers Let X,, X,,... be 


. a sequence of independent and identically distributed random variables having 


mean yt. Then, for any € > 0, 


el 
Proof We give a proof under the additional assumption that the random 
variables X; have a finite variance o°. Now 


X+ +X, 
n 


-u|>e}>0 as n —> œ 


p[/ a) es: LEX] +--+ EXD = 


and 


N 


var( 4% o 


H) = IVar) ++ Var) = 


n 
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where the above equation makes use of the fact that the variance of the sum 
of independent random variables is equal to the sum of their variances. Hence, 
from Chebyshev’s inequality, it follows. that for any positive k 


| 
n 


Hence, for any e€ > 0, by letting k be such that ko/./n = e, that is, by letting 
k? = ne?/o*, we see that 


L 


which establishes the result. O 


n 


Xite +X, | | o? 
E ee 


A generalization of the weak law is the strong law of large numbers, which 
states that, with probability 1, 


"e Xit +X, = 


n->ce n 


That is, with certainty, the long-run average of a sequence of independent and 
identically distributed random variables will converge to its mean. 
2.8 Some Discrete Random Variables 


There are certain types of random variables that frequently appear in applications. 
In this section we survey some of the discrete ones. 


Binomial Random Variables 


Suppose that n independent trials, each of which results in a “success” with 
probability p, are to be performed. If X represents the number of successes 
that occur in the n trials, then X is said to be a binomial random variable with 
parameters (7, p). Its probability mass function is given by 


P= P(x=i)=(7) ra- i=0,1,...,” (2.5) 


is the binomial coefficient, equal to the number of different subsets of i elements 
that can be chosen from a set of n elements. 


where 
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The validity of Equation (2.5) can be seen by first noting that the probability 
of any particular sequence of outcomes that results in i successes and n — i 
failures is, by the assumed independence of trials, p'(1 — p)". Equation (2.5) 
then follows since there are (7) different sequences of the n outcomes that result 
in i successes and n — i failures—which can be seen by noting that there are (7) 
different choices of the i trials that result in successes. 

A binomial (1, p) random variable is called a Bernoulli random variable. Since 
a binomial (n, p) random variable X represents the number of successes in n 
independent trials, each of which results in a success with probability p, we can 
represent it as follows: 


X= D X; (2.6) 
isi 
where 
x= 1 if the ith trial is a success 
i= |O otherwise 
Now 


E[X;] = P{X,;=1} =p 
Var(X) = E [x7] — E(x] 
= p—p’ =p(1—p) 
where the above equation uses the fact that X? = X, (since 0? = 0 and 


1? = 1). Hence the representation (2.6) yields that, for a binomial (n, p) random 
variable X, 


PIX] = EX = np 


Var(X) = $ Var(X;) since the X; are independent 
i=1 
=np(1—p) 
The following recursive formula expressing p;,, in terms of p; is useful when 
computing the binomial probabilities: 


n! i+ inl 
7 = L 1 — ne 
Pan = Giay CTP 
n!(n—i) ; _; P 
= '(1— nep A 
Googe? tPF 
_ nai p 
i+1l-p” 
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Poisson Random Variables 


A random variable X that takes on one of the values 0, 1, 2,...is said to-be a 
Poisson random variable with parameter A, A > 0, if its probability mass function 
is given by 


ii 
pi = P(X =i} = e, i=0,l,... 
k 


The symbol e, defined by e = lim„„œ(1 + 1/7)", is a famous constant in math- 
ematics that is roughly equal to 2.7183. 

Poisson random variables have a wide range of applications. One reason for 
this is that such random variables may be used to approximate the distribution of 
the number of successes in a large number of trials (which are either independent 
or at most “weakly dependent”) when each trial has a small probability of 
being a success. To see why this is so, suppose that X is a binomial random 
variable with parameters (n, p)—and so represents the number of successes in 
n independent trials when each trial is a success with probability p—and let 
à = np. Then 

n! 


Doa Gann S 


-me 0-4)" 


_ n(n—1)---(n—i+1) Ai (1—A/n)" 
Z ni i! (1—A/n)i 


Now for n large and p small, 
AN! —1)---(n—=i ! 
(1-2) wend. n(n—1) (n E i (1-2) a4 
ni 


Hence, for n large and p small, 


Ài 
P{X =i} = ee 
i! 
Since the mean and variance of a binomial random variable Y are given by 
E[Y] = np, Var(Y) =np(1—p)*np_ for small p 


it is intuitive, given the relationship between binomial and Poisson random 
variables, that for a Poisson random variable, X, having parameter A, 


E[X] = Var(X) =A 


An analytic proof of the above is left as an exercise. 
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To compute the Poisson probabilities we make use of the following recursive 
formula: 


eà piti 


Pisi GAYE À 


or, equivalently, 


i20 


Pi+i = II {Pe z 


Suppose that a certain number, N, of events will occur, where N is a Poisson 
random variable with mean À. Suppose further that each event that occurs will, 
independently, be either a type 1 event with probability p or a type 2 event with 
probability 1 — p. Thus, if N; is equal to the number of the events that are type 
i, i = 1,2, then N = N, +N}. A useful result is that the random variables N, and 
N, are independent Poisson random variables, with respective means 


E[N,|=Ap E[N,]=A(1—p) 


To prove this result, let n and m be nonnegative integers, and consider the joint 
probability P{N, = n, Na = m}. Because P{N, = n, N, = m|N #£n+m} =0, 
conditioning on whether N = n+ m yields 


P{N, =n, N, = m} = P{N, =n, N,=m|N = n+ m}P{N =n+m} 


n+m 


= P{N, =N, N, = m|N = atime” aa 


However, given that N = n+ m, because each of the n+ m events is indepen- 
dently either a type 1 event with probability p or type 2 with probability 1 — p, 
it follows that the number of them that are type 1 is a binomial random variable 
with parameters n +m, p. Consequently, 


atm 
(n+)! 
A" A™ 
(n+m)! 


n+m 


P{N, =n, N, =m} = ( )ra —p)"e* 


_ (n+m)! 
~ onm! 


— gop (Ap)” e-AC-p) a-p)” 
n! m! 


p" (1 = pye Pe AP) 
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Summing over m yields that 


P{N, =n} = P{N, =n, N, =m} 
=—e7P (Ap)" Siew) (A(1 — p))” 
m! 


Similarly, 


P{N, e m} = ga- (ACL =p)” 
i m! 


thus verifying that NV, and N, are indeed independent Poisson random variables 
with respective means Ap and A(1— p). 
The preceding result generalizes when each of the Poisson number of 


events is independently one of the types 1,..., 7, with respective probabilities 
Pis- +++ Pps Lim Pj = 1. With N; equal to the number of the events that are type 
i,i=1,...,¥%, it is similarly shown that N,,...,N, are independent Poisson 


random variables, with respective means 


E[N] =àp, i=1,...,r 


Geomeiric Random Variables 


Consider independent trials, each of which is a success with probability p. If X 
represents the number of the first trial that is a success, then 


P{X=n}=p(i-p)’", n>1 (2.7) 


which is easily obtained by noting that in order for the first success to occur 
on the nth trial, the first n—1 must all be failures and the nth a success. 
Equation (2.7) now follows because the trials are independent. 

A random variable whose probability mass function is given by (2.7) is said 
to be a geometric random variable with parameter p. The mean of the geometric 
is obtained as follows: 


FX) = So np(1—p)" = 5 
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where the above equation made use of the algebraic identity, for 0 < x < 1, 


ks d [<= d 1 1 
a-l Ay os i 
phe ~ dx a ) dx (=) G- x} 


Jt is also not difficult to show that 


l1- 
Var(X) = — 2 
p 


2 


The Negative Binomial Random Variable 


If we let X denote the number of trials needed to amass a total of r successes 
when each trial is independently a success with probability p, then X is said to be 
a negative binomial, sometimes called a Pascal, random variable with parameters 
p and r. The probability mass function of such a random variable is given by 
the following: 


P{X =n} = a i) p-p), nzr (2.8) 


To see why Equation (2.8) is valid note that in order for it to take exactly n 
trials to amass r successes, the first n — 1 trials must result in exactly r— 1 
successes—and the probability of this is (#21) p~! (1 — p)""’—and then the nth 
trial must be a success—and the probability of this is p. 

If we let X;,i=1,..., 7, denote the number of trials needed after the (i — 1)st 


success to obtain the ith success, then it is easy to see that they are independent 


geometric random variables with common parameter p. Since 


X = DX: 
i=l 
we see that 
r r 
ELX] = EX] = 5 
fel 


Var(X) = $` Var(X,) = Tz p) 


where the preceding made use of the corresponding results for geometric random 
variables. 


| 
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Hypergeometric Random Variables 


Consider an urn containing N+ M balls, of which N are light colored and M 
are dark colored. If a sample of size n is randomly chosen [in the sense that 
each of the (*™) subsets of size n is equally likely to be chosen] then X, the 
number of light colored balls selected, has probability mass function 


(7) (e") 
i}\n-i 
P{X = i} = ~ 
N+M 
n 
A random variable X whose probability mass function is given by the preceding 
equation is called a hypergeometric random variable. 


Suppose that the n balls are chosen sequentially. If we let 


x= 1 if the ith selection is light 
i |0 otherwise 


then 
X= DX; (2.9) 
i=l 


and so 


nN 
N+M 


FX] = F EIX] = 


i=] 

where the above equation uses the fact that, by symmetry, the ith selection 

is equally likely to be any of the N+M balls, and so E[X,] = P{X; = 1} = 
N/(N +M). 

Since the X; are not independent (why not?), the utilization of the represen- 


tation (2.9) to compute Var(X) involves covariance terms. The end product can 
be shown to yield the result 


nNM n—l1 
veee fe 
ae) ara | wea) 


2.9 Continuous Random Variables 


In this section we consider certain types of continuous random variables. 
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Uniformly Distributed Random Variables 


A random variable X is said.to be uniformly distributed over the interval 
(a, b), a < b, if its probability density function is given by 


wa<x<b 
otherwise 


pe Ue 
roe 


In other words, X is uniformly distributed over (a, b) if it puts all its mass on 
that interval and it is equally likely to be “near” any point on that interval. 

The mean and variance of a uniform (a, b) random variable are obtained as 
follows: 


1 b b*—a* b+a 
E[X] [ = 


pone aay 2 
b-a a’ +b’+ab 
3(b—a) 3 


1 b 
27 2s E 
EX] =5— | x°dx 
and so 
I 2 2 1 2 2 1 2 
Var(X) = AG +b +ab)— 42 +b*+2ab) = gea 
Thus, for instance, the expected value is, as one might have expected, the 


midpoint of the interval (a, b). 
The distribution function of X is given, for a < x < b, by 


x~a 
b—a 


F(x) = P(X <x}= Í Gea iks 


i | li L ! 


U-30 H-O H U+0 H+30 


Figure 2.1. The normal density function. 
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Normal Random Variables 


A random variable X is said to be normally distributed with mean yz and variance 
o° if its probability density function is given by 


1 


e7 -HY 120° : 
NITO 


f(x) = 


—O <x <% 


The normal density is a bell-shaped curve that is symmetric about p (see 
Figure 2.1). 

It is not difficult to show that the parameters 4 and o? equal the expectation 
and variance of the normal. That is, 


E[X]= and Var(X)=o0? 


An important fact about normal random variables is that if X is normal with 
mean 4 and variance o°, then for any constants a and b, aX +b is normally 
distributed with mean au +b and variance a*o*. It follows from this that if X 
is normal with mean u and variance o, then 


Z- 
~ 0 


Z 


is normal with mean 0 and variance 1. Such a random variable Z is said to have 
a standard (or unit) normal distribution. Let ® denote the distribution function 
of a standard normal random variable; that is, 


1 x a 
D(x) = el. er dx, -w<x<00 


The result that Z = (X —w)/o has a standard normal distribution when X 
is normal with mean u and variance g? is quite useful because it allows us 
to evaluate all probabilities concerning X in terms of ®. For example, the 
distribution function of X can be expressed as 


F(x) = P{X < x} 


ott 
oC oO 


The value of ®(x) can be determined either by looking it up in a table or by 
writing a computer program to approximate it. 
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For a in the interval (0, 1), let z, be such that 
P{Z>z,}= 1—@(z,) =a 


That is, a standard normal will exceed z, with probability a (see Figure 2.2). 
The value of z, can be obtained from a table of the values of ®. For example, 


‘since 


(1.64) = 0.95, @(1.96) = 0.975, @(2.33) = 0.99 
we see that 


Zos = 1.64, Zos = 1.96, Zp, = 2.33 


0 Za 


Figure 2.2. P{Z >z,}=a. 


The wide applicability of normal random variables results from one of the 
most important theorems of probability theory—the central limit theorem, which 
asserts that the sum of a large number of independent random variables has 


- approximately a normal distribution. The simplest form of this remarkable 


theorem is as follows. 


The Central Limit Theorem Let X,, X,,... be a sequence of independent 
and identically distributed random variables having finite mean u and finite 
variance o°. Then 


lim P 


n>% 


O 


A <x} = 06) 


Exponential Random Variables 
A continuous random variable having probability density function 


JOQ =A, O0<x<0 
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for some A > 0 is said to be an exponential random variable with parameter À. 
Its cumulative distribution is given by 


Fa) =f he“ dx=1-e*, 0<x<0o 


It is easy to verify that the expected value and variance of such a random variable 
are as follows: 


E[X] = x and Var(X) = a 


The key property of exponential random variables is that they possess the 
“memoryless property,” where we say that the nonnegative random variable X 
is memoryless if 


P{X>s4+t|\X>s}=P{X> 1} forall s,t>0 (2.10) 


To understand why the above is called the memoryless property, imagine that X 
represents the lifetime of some unit, and consider the probability that a unit of 
age s will survive an additional time ż. Since this will occur if the lifetime of 
the unit exceeds t+ given that it is still alive at time s, we see that 


P{additional life of an item of age s exceeds t} = P{X >s+t|X> s} 


Thus, Equation (2.10) is a statement of fact that the distribution of the remaining 

life of an item of age s does not depend on s. That is, it is not necessary to 

remember the age of the unit to know its distribution of remaining life. 
Equation (2.10) is equivalent to 


P{X > s+t} = P{X > s}P{X > t} 


As the above equation is satisfied whenever X is an exponential random 
variable—since, in this case, P{X > x} = e~“*—we see that exponential random 
variables are memoryless (and indeed it is not difficult to show that they are the 
only memoryless random variables). 

Another useful property of exponential random variables is that they remain 
exponential when multiplied by a positive constant. To see this suppose that X 
is exponential with parameter A, and let c be a positive number. Then 


P{cX <x}=P|X < =| =1—e ue 
C 
which shows that cX is exponential with parameter A/c. 


Let X,,..., X„ be independent exponential random variables with respective 
rates A,,...,A,. A useful result is that min(X,,...,X,,) is exponential with 


2.9 Continuous Random Variables 29 


rate J`; À; and is independent of which one of the X; is the smallest. To verify 
this, let M = min(X,,..., Xa). Then, 


P(X; = min X;|M > t} = P{X,;-t= min(X; — t)|M >t} 
= P{X;—t=min(X,—1)|X; >ti=l,...,n} 
= P{xX;= min X;} 
The final equality follows because, by the lack of memory property of expo- 


nential random variables, given that X; exceeds ¢, the amount by which it 
exceeds it is exponential with rate A;. Consequently, the conditional distribution 


of X,—t,...,X,—t given that all the X; exceed ¢ is the same as the uncondi- 
tional distribution of X,,...,X,,. Thus, M is independent of which of the X; is 
the smallest. 


The result that the distribution of M is exponential with rate >°; A; follows 
from 


P{M> t}=P{X,>t,i=1,...,n}=]] P(X, > t} =e te 


i=l 


The probability that X; is the smallest is obtained from 
P{X, = M} = f P{X; = M|X; = }Aje™'dt 
= f P{X, > tix j|X, = }Aje™'dt 
z f P(X, > t,i £ faje™'dt 
=f (1 e) àje™ dt 
ij 
=A fer" a 


À 


J 


Doi À; 


The Poisson Process and Gamma Random Variables 


Suppose that “events” are occurring at random time points and let N(t) denote 
the number of events that occur in the time interval [0, t]. These events are said 
to constitute a Poisson process having rate À, À > 0, if 


(a) M0) =0. 
(b) The numbers of events occurring in disjoint time intervals are independent. 
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(c) The distribution of the number of events that occur in a given interval 
depends only on the length of the interval and not on its location. 
(d) lim, o Hh 1 o = 


(e) oe rs =0. 


Thus Condition (a) states that the process begins at time 0. Condition (b), the 
independent increment assumption, states that the number of events by time t 
[i.e., N(t)] is independent of the number of events that occur between t and t+s 
[i.e., N(t-+s) — N(t)]. Condition (c), the stationary increment assumption, states 
that the probability distribution of N(t + s) — N(t) is the same for all values of t. 
Conditions (d) and (e) state that in a small interval of length h, the probability 
of one event occurring is approximately Ah, whereas the probability of two or 
more is approximately 0. 

We now argue that these assumptions imply that the number of events occur- 
ring in an interval of length ¢ is a Poisson random variable with mean At. To do 
so, consider the interval [0, t], and break it up into n nonoverlapping subintervals 
of length t/n (Figure 2.3). Consider first the number of these subintervals that 
contain an event. As each subinterval independently [by Condition (b)] contains 
an event with the same probability [by Condition (c)], which is approximately 
equal to At/n, it follows that the number of such intervals is a binomial random 
variable with parameters n and p ~ At/n. Hence, by the argument yielding the 
convergence of the binomial to the Poisson, we see by letting n — oo that the 
number of such subintervals converges to a Poisson random variable with mean 
At. As it can be shown that Condition (e) implies that the probability that any of 
these subintervals contains two or more events goes to 0 as n — on, it follows 
that M(t), the number of events that occur in [0, f], is a Poisson random variable 
with mean At. 
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Figure 2.3. The Interval [0, t]. 


For a Poisson process let X, denote the time of the first event. Furthermore, 
for n > 1, let X,, denote the elapsed time between the (n — 1)st and the nth event. 
The sequence {X,, n = 1,2, ...} is called the sequence of interarrival times. 
For instance, if X, = 5 and X, = 10, then the first event of the Poisson process 
will occur at time 5 and the second at time 15. 

We now determine the distribution of the X,,. To do so, we first note that the 
event {X, > t} takes place if and only if no events of the Poisson process occur 
in the interval [0, ¢]; thus 


P{X, > t} = P(N) = 0} = e™ 
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Hence, X, has an exponential distribution with mean 1/A. To obtain the distri- 
bution of X,, note that 
P{X, > t|X, = s} = P{0 events in (s, s+ 4]|X, = s} 
= P{0 events in (s, s+t]} 
2 e™™ 


where the last two equations followed from independent and stationary incre- 
ments. Therefore, from the foregoing, we conclude that X, is also an exponential 
random variable with mean 1/A and, furthermore, that X, is independent of X}. 
Repeating the same argument yields: 


Proposition The interarrival times X,, X3, ... are independent and identi- 
cally distributed exponential random variables with parameter À. 


Let S, = } i- X; denote the time of the nth event. Since S, will be less than 
or equal to ¢ if and only if there have been at least n events by time t, we see that 


P{S, <t}=P{N) > n} 
foe] À j 
= 2 n 


Since the left-hand side is the cumulative distribution function of S,, we obtain, 
upon differentiation, that the density function of S,—call it f,(4)—is given by 


E Si HO S, w AN 
t)= 5 jàe™ Ae 
fA = Èi -È T 
z RR sy ar (At)! 
vad Àe ANI au Àe i 
=D G1)! 2 J! 
E (At n-1 
(n—1)! 
Definition A random variable having probability density function 
ody (At n—1 
(n—1)!’ 


is said to be a gamma random variable with parameters (n, À). 


= Àe 


t>0 


f(t) =e 


Thus we see that S,,, the time of the nth event of a Poisson process having rate 
A, is a gamma random variable with parameters (n, A). In addition, we obtain 
from the representation S, = }°7_, X; and the previous proposition, which stated 
that these X; are independent exponentials with rate A, the following corollary. 


32 2 Elements of Probability 


Corollory The sum of n independent exponential random variables, each 
having parameter À, is a gamma random variable with parameters (n, A). 


The Nonhomogeneous Poisson Process 


From a modeling point of view the major weakness of the Poisson process is its 
assumption that events are just as likely to occur in all intervals of equal size. A 
generalization, which relaxes this assumption, leads to the nonhomogeneous or 
nonstationary process. 

If “events” are occurring randomly in time, and N(t) denotes the number 
of events that occur by time ft, then we say that {N(t),t > 0} constitutes a 
nonhomogeneous Poisson process with intensity function A(t), t > 0, if 


(a) N(O) = 0. 

(b) The numbers of events that occur in disjoint time intervals are independent. 
(c) lim,.9 P{exactly 1 event between t and t+h}/h = A(t). 

(d) lim,_,, P{2 or more events between t and t-+h}/h=0. 


The function m(t) defined by 
t 
m(t) = I Ms)ds, t20 
0 
is called the mean-value function. The following result can be established. 


Proposition N(t+s)— N(t) is a Poisson random variable with mean m(t + 


5) — m(t). 


The quantity A(f), called the intensity at time ż, indicates how likely it is that an 

event will occur around the time t. [Note that when A(f) = A the nonhomogeneous 

“reverts to the usual Poisson process.] The following proposition gives a useful 
way of interpreting a nonhomogeneous Poisson process. 


Proposition Suppose that events are occurring according to a Poisson 
process having rate À, and suppose that, independently of anything that came 
before, an event that occurs at time t is counted with probability p(t). Then the 
process of counted events constitutes a nonhomogeneous Poisson process with 
intensity function A(t) = Ap(t). 


Proof This proposition is proved by noting that the previously given condi- 
tions are all satisfied. Conditions (a), (b), and (d) follow since the corresponding 


2.10 Conditional Expectation and Conditional Variance 33 


result is true for all (mot just the counted) events. Condition (c) follows 
since 


P{1 counted event between t and t+ h} 
= P{1 event and it is counted} 
+ P{2 or more events and exactly 1 is counted} 
~ Ahp(t) 


2.10 Conditional Expectation and Conditional Variance 


If X and Y are jointly discrete random variables, we define E[X|Y = y], the 
conditional expectation of X given that Y = y, by 


E[X|Y =y] =) xP{X =x|¥ =y} 


> xP{X =x, Y =y} 
PLY =y} 


In other words, the conditional expectation of X, given that Y = y, is defined 
like E[X] as a weighted average of all the possible values of X, but now with 
the weight given to the value x being equal to the conditional probability that X 
equals x given that Y equals y. 

Similarly, if X and Y are jointly continuous with joint density function f(x, y), 
we define the conditional expectation of X, given that Y = y, by 


J xf(x, y)dx 
S f(x, y)dx 
Let E[X|Y| denote that function of the random variable Y whose value at Y = y 


is E[X|Y = y]; and note that E[X|Y| is itself a random variable. The following 
proposition is quite useful. 


EIX|Y =y] = 


Proposition 
E{E[X|Y]] = E[X] (2.11) 
If Y is a discrete random variable, then Equation (2.11) states that 


E[X] =) E[X|Y = y]P{Y = y} 


34 2 Elements of Probability 


whereas if Y is continuous with density g, then (2.11) states 


E[X] = f E[X|¥ = yle(y)dy 
We now give a proof of the preceding proposition when X and Y are discrete: 


DELAY = y|P(Y = y} = S PIX = xY = y} PLY = y} 


ki y x 


=), 2 xP{X =x, Y =>} 
=} r} PIX =x, Y=y} 
=) xP{X =x} 


= E[X] 


We can also define the conditional variance of X, given the value of Y, 
as follows: 


Var(X|Y) = E[(X — E[X|¥])’|¥] 
That is, Var(X|¥) is a function of Y, which at Y = y is equal to the variance 
of X given that Y = y. By the same reasoning that yields the identity Var(X) = 
E[X?]— (E[X])* we have that 
Var(X|¥) = E[X?|¥| — (E[X|¥])? 
Taking expectations of both sides of the above equation gives 
E[Var(X|¥)] = ELEL? Y] — E[(E[X|¥])”] 
= E[X?] — E[(E[X|¥])"] (2.12) 
Also, because E[E[X|¥]] = E[X], we have that 
Var(E[X|¥]) = E[(E[X|¥])*] — ERD (2.13) 


Upon adding Equations (2.12) and (2.13) we obtain the following identity, known 
as the conditional variance formula. 


The Conditional Variance Formula 


Var(X) = E[Var(X|¥)] + Var(E[X|¥1) 
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Exercises 
1. 
(a) For any events A and B show that 
AUB=AUA‘B 
B=ABUA‘B 


(b) Show that 
P(AUB) = P(A) + P(B) — P(AB) 


2. Consider an experiment that consists of six horses, numbered 1 through 6, 
running a race, and suppose that the sample space is given by 


S = {all orderings of (1, 2,3, 4,5, 6)} 


Let A denote the event that the number 1 horse is among the top three finishers, 
let B denote the event that the number 2 horse comes in second, and let C denote 
the event that the number 3 horse comes in third. 


(a) Describe the event AUB. How many outcomes are contained in this 
event? 

(b) How many outcomes are contained in the event AB? 

(c) How many outcomes are contained in the event ABC? 

(d) How many outcomes are contained in the event AU BC? 


3. A couple has two children. What is the probability that both are girls given 
that the elder is a girl? Assume that all four possibilities are equally likely. 


4. The king comes from a family of two children. What is the probability that 
the other child is his brother? 


- 5, The random variable X takes on one of the values 1, 2, 3, 4 with probabilities 


P{X=i}=ic, i=1,2,3,4 
for some value c. Find P{2 < X < 3}. 


6. The continuous random variable X has a probability density func- 
tion given by 


f(x)=cx, O<x<l 
Find P{X > 3}. 
7. If X and Y have a joint probability density function specified by 
f(x,y) = 200, 0<x<0,0<y<oo 


Find P{X < Y}. 
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8. Find the expected value of the random variable specified in Exercise 5. 
9. Find E[X] for the random variable of Exercise 6. 


10. There are 10 different types of coupons and each time one obtains a coupon 
it is equally likely to be any of the 10 types. Let X denote the number of 
distinct types contained in a collection of N coupons, and find E[X]. (Hint: For 
i=1,...,10 let 


x= 1 ifa type i coupon is among the N 
'— ]0 otherwise 


10 De 


and make use of the representation X = i=] i 
11. A die having six sides is rolled. If each of the six possible outcomes is 


equally likely, determine the variance of the number that appears. 


12. Suppose that X has probability density function 
f(x) =ce*, O<x<1 


Determine Var(X). 
13. Show that Var(aX +b) = a° Var(X). 


14. Suppose that X, the amount of liquid apple contained in a container of 
commercial apple juice, is a random variable having mean 4 grams. 


(a) What can be said about the probability that a given container contains 
more than 6 grams of liquid apple? 

(b) If Var(X) = 4(grams)*, what can be said about the probability that a 
given container will contain between 3 and 5 grams of liquid apple? 


15. An airplane needs at least half of its engines to safely complete its mission. 
If each engine independently functions with probability p, for what values of p 
is a three-engine plane safer than a five-engine plane? 


16. For a binomial random variable X with parameters (n, p), show that 
P{X =i} first increases and then decreases, reaching its maximum value when i 
is the largest integer less than or equal to (n-++1)p. 


17. If X and Y are independent binomial random variables with respective 
parameters (7, p) and (m, p), argue, without any calculations, that X +Y is 
binomial with parameters (n+ m, p). 
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18. Explain why the following random variables all have approximately a 
Poisson distribution: 


(a) The number of misprints in a given chapter of this book. 
(b) The number of wrong telephone numbers dialed daily. 
(c) The number of customers that enter a given post office on a given day. 


` 19. If X is a Poisson random variable with parameter A, show that 


(a) E[X] =A. 
(b) Var(X) =A. 


20. Let X and Y be independent Poisson random variables with respective 
parameters A, and A,. Use the result of Exercise 17 to heuristically argue that 
X +Y is Poisson with parameter A, + À». Then give an analytic proof of this. 
[Hint: i 


P{X+Y =k} =X =i, Y =k—i} =YPx = i}P{Y = k—i}] 
i=0 i=0 
21. Explain how to make use of the relationship 
À 
Pi = IpI” 
to compute efficiently the Poisson probabilities. 
22. Find P{X > n} when X is a geometric random variable with parameter p. 


23. Two players play a certain game until one has won a total of five games. If 
player A wins each individual game with probability 0.6, what is the probability 
she will win the match? 


24. Consider the hypergeometric model of Section 2.8, and suppose that the 


` white balls are all numbered. For i = 1,...,N let 


Y. = 


1 if white ball numbered i is selected 
0 otherwise 


Argue that X = JY, Y, and then use this representation to determine E[X]. 
Verify that this checks with the result given in Section 2.8. 


25. The bus will arrive at a time that is uniformly distributed between 8 and 
8:30 A.M. If we arrive at 8 A.M., what is the probability that we will wait between 
5 and 15 minutes? 


26. For a normal random variable with parameters 2 and a” show that 


(a) E[X] =p. 
(b) Var(X) = o°. 
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27. Let X be a binomial random variable with parameters (n, p). Explain why 


X—np 1 ae 
P | —— < x} Z — e* Pdx 
| Ast | 2T J. 
when n is large. 


28. If X is an exponential random variable with parameter A, show that 


(a) E[X]=1/A. l 
(b) Var(X) = 1/22. 


29. Persons A, B, and C are waiting at a bank having two tellers when it opens 
in the morning. Persons A and B each go to a teller and C waits in line. If 
the time it takes to serve a customer is an exponential random variable with 
parameter A, what is the probability that C is the last to leave the bank? [Hint: 
No computations are necessary. | 


30. Let X and Y be independent exponential random variables with respective 
rates A and yw. Is max (X, Y) an exponential random variable? 


31. Consider a Poisson process in which events occur at a rate 0.3 per hour. 
What is the probability that no events occur between 10 A.M. and 2 P.M.? 


32. For a Poisson process with rate A, find P{N(s) = k|N(t) = n} when s < t. 
33. Repeat Exercise 32 for s > t. 


34. If X is a gamma random variable with parameters (n, A), find 


(a) E[X]. 
(b) Var(X). 


35. An urn contains four white and six black balls. A random sample of size 4 
is chosen. Let X denote the number of white balls in the sample. An additional 
ball is now selected from the remaining six balls in the urn. Let Y equal 1 if this 
ball is white and 0 if it is black. Find 


(a) E[Y|X =2]. 
(b) E[X|Y = 1]. 
(c) Var(Y|X = 0). 
(d) Var(X|¥ = 1). 


36. If X and Y are independent and identically distributed exponential random 
variables, show that the conditional distribution of X, given that X +Y =t, is 
the uniform distribution on (0, t). 
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Random Numbers 


Introduction 


The building block of a simulation study is the ability to generate random 
numbers, where a random number represents the value of a random variable 
uniformly distributed on (0, 1). In this chapter we explain how such numbers 
are computer generated and also begin to illustrate their uses. 


3.1 Pseudorandom Number Generation 


Whereas random numbers were originally either manually or mechanically gen- 
erated, by using such techniques as spinning wheels, or dice rolling, or card 
- shuffling, the modern approach is to use a computer to successively generate 
pseudorandom numbers. These pseudorandom numbers constitute a sequence 
of values, which, although they are deterministically generated, have all the 
appearances of being independent uniform (0, 1) random variables. 

One of the most common approaches to generating pseudorandom numbers 
starts with an initial value xp, called the seed, and then recursively computes 
successive values x„, > 1, by letting 


Xa =ax,_,; modulo m (3.1) 


where a and m are given positive integers, and where the above means that 
ax,_, is divided by m and the remainder is taken as the value of x,. Thus, 
each x, is either 0, 1, . . . ,m— 1 and the quantity x, /m—called a pseudorandom 
number—is taken as an approximation to the value of a uniform (0, 1) random 
variable. 


Al 
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The approach specified by Equation (3.1) to generate random numbers is called 
the multiplicative congruential method. Since each of the numbers x, assumes 
one of the values 0,1,...,—1, it follows that after some finite number (of at 
most m) of generated values a value must repeat itself; and once this happens 
the whole sequence will begin to repeat. Thus, we want to choose the constants 
a and m so that, for any initial seed x9, the number of variables that can be 
generated before this repetition occurs is large. 

In general the constants a and m should be chosen to satisfy three criteria: 


1. For any initial seed, the resultant sequence has the “appearance” of being a 
sequence of independent uniform (0, 1) random variables. 

2. For any initial seed, the number of variables that can be generated before 
repetition begins is large. 

3. The values can be computed efficiently on a digital computer. 


A guideline that appears to be of help in satisfying the above three conditions 
is that m should be chosen to be a large prime number that can be fitted to the 
computer word size. For a 32-bit word machine (where the first bit is a sign bit) 
it has been shown that the choices of m = 23! —1 and a= 7 = 16, 807 result 
in desirable properties. (For a 36-bit word machine the choices of m = 2% —31 
and a = 5° appear to work well.) 

Another generator of pseudorandom numbers uses recursions of the type 


X, = (ax, +c) modulo m 


Such generators are called mixed congruential generators (as they involve both 
an additive and a multiplicative term). When using generators of this type, one 
often chooses m to equal the computer’s word length, since this makes the 
computation of (ax, +c) modulo m—that is, the division of ax„_ı +c by 
m—quite efficient. 

As our starting point in the computer simulation of systems we suppose that 
we can generate a sequence of pseudorandom numbers which can be taken as an 
approximation to the values of a sequence of independent uniform (0, 1) random 
variables. That is, we do not explore the interesting theoretical questions, which 
involve material outside the scope of this text, relating to the construction of 
“good” pseudorandom number generators. Rather, we assume that we have a 
“black box” that gives a random number on request. 


3.2 Using Random Numbers to Evaluate Integrals 


One of the earliest applications of random numbers was in the computation of 
integrals. Let g(x) be a function and suppose we wanted to compute 0 where 


1 
6 =f g(x)dx 
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To compute the value of 0, note that if U is uniformly distributed over (0, 1), 
then we can express 6 as 


6 = E[g(U)] 


If U,,..., U, are independent uniform (0, 1) random variables, it thus follows 


that the random variables g(U,),...,g(U,) are independent and identically 


distributed random variables having mean 8. Therefore, by the strong law of 
large numbers, it follows that, with probability 1, 


i 
LE > roso as k > œo 


i=l 


Hence we can approximate 0 by generating a large number of random numbers 
u; and taking as our approximation the average value of g(u;). This approach to 
approximating integrals is called the Monte Carlo approach. 

If we wanted to compute 


a= f edx 


then, by making the substitution y = (x—a)/(b—a), dy = dx/(b—a), we see 
that 


a= | a(a+[b- ah) (o-a) dy 


= f nod 


where h(y) = (b—a)g(a+([b—aly). Thus, we can approximate 0 by continually 
generating random numbers and then taking the average value of h evaluated at 


` these random numbers. 


Similarly, if we wanted 
6= x) dx 


we could apply the substitution y = 1/(x+1), dy = —dx/(x+1)? =—y’ dx, to 
obtain the identity 


1 
e= h(y) dy 


where 


{ 
j 
i 
‘ 
1 
! 
f 
‘ 
i 
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The utility of using random numbers to approximate integrals becomes more 


apparent in the case of multidimensional integrals. Suppose that g is a function 
with an n-dimensional argument and that we are interested in computing 


o=f f.. f rrt dX, +++ dX, 


The key to the Monte Carlo approach to estimate @ lies in the fact that 9 can be 
expressed as the following expectation: 


0=E[g(U,,.--,U,)] 
where U,,..., U,, are independent uniform (0, 1) random variables. Hence, if 


we generate k independent sets, each consisting of n independent uniform (0, 1) 
random variables 


Ul,..., U} 
Ues U 
UF US 
then, since the random variables g(Uj,..., U'),i=1,...,k, are all independent 


and identically distributed random variables with mean 0, we can estimate 0 by 
Dias Uiers UE: 

For an application of the above, consider the following approach to 
estimating 7r. 


Example 3a The Estimation of 2 Suppose that the random vector 
(X, Y) is uniformly distributed in the square of area 4 centered at the origin. 
That is, it is a random point in the region specified in Figure 3.1. Let us consider 
now the probability that this random point in the square is contained within the 
inscribed circle of radius 1 (see Figure 3.2). Note that since (X, Y) is uniformly 
distributed in the square it follows that 


P{(X, Y) is in the circle} = P{X*+¥? < 1} 
_ Area of the circle _ 7 
~ Area of the square 4 


Hence, if we generate a large number of random points in the square, the 
proportion of points that fall within the circle will be approximately 7/4. Now 
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(-1, 1) 4,1) 


¢-1, -1) a, -1) 
ə= (0,0) 


Figure 3.1. Square. 


(1, 1) qd, 1) 


on 


(-1, —1) (1, -1) 
e = (0, 0) 


Figure 3.2. Circle within Square. 


if X and Y were independent and both were uniformly distributed over (—1, 1), 
their joint density would be 


f(x, y) = fF) FO) 


i 
| 
i 
i 
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Since the density function of (X, Y) is constant in the square, it thus follows 
(by definition) that (X, Y) is uniformly distributed in the square. Now if U is 
uniform on (0, 1) then 2U is uniform on (0, 2), and so 2U —1 is uniform on 
(—1, 1). Therefore, if we generate random numbers U, and U,, set X =2U,—1 
and Y = 2U, — 1, and define 


j=]! eee 
“10 otherwise 


E[N = P(X +Y <1}= 3 


Hence we can estimate 7/4 by generating a large number of pairs of ran- 
dom numbers u#,, u, and estimating 7/4 by the fraction of pairs for which 
(2u, —1} + (2m —1)? <1. Oo 


Thus, random number generators can be used to generate the values of uni- 
form (0, 1) random variables. Starting with these random numbers we show in 
Chapters 4 and 5 how we can generate the values of random variables from 
arbitrary distributions. With this ability to generate arbitrary random variables 
we will be able to simulate a probability system—that is, we will be able to gen- 
erate, according to the specified probability laws of the system, all the random 
quantities of this system as it evolves over time. 


Exercises 


1. If x9 =5 and 


Xx, = 3x,_, mod 150 


find x,,..., Xio 
2. If x) =3 and 

Xa = (5x,_;-+7) mod 200 
find x,,..., X19. 


In Exercises 3-9 use simulation to approximate the following integrals. Compare 
your estimate with the exact answer if known. 


3. fy exp{e™} dx 
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ia —x)3? dx 

E et dx 

J x(1 +x)? dx 

JE e> dx 

A h et)" dy dx 

So fo oo &™ dy dx 

1 ify<x 


0 ify>x 
equate the integral to one in which both terms go from 0 to oo.] 


Sea Am S 


[Hint: Let I(x) = and use this function and use this function to 


10. Use simulation to approximate Cov(U, e”), where U is uniform on (0, 1). 
Compare your approximation with the exact answer. 


11. Let U be uniform on (0, 1). Use simulation to approximate the following: 


(a) Corr(U, V1 — U?). 
(b) Corr(U?, V1— U7). 


12. For uniform (0, 1) random variables U}, U,,... define 
N= Mini (n: 5u > 1 
i=l. 


That is, N is equal to the number of random numbers that must be summed to 
exceed 1. 


(a) Estimate E[N] by generating 100 values of N. 
(b) Estimate E[N] by generating 1000 values of N. 
(c) Estimate E[N] by generating 10,000 values of N. 
(d) What do you think is the value of E[N]? 


13. Let ỌU;, i> 1, be random numbers. Define N by 
N = Maximum jn: [| U; > e” 
i=l 
where They U = 1. 


(a) Find E[N] by simulation. 
(b) Find P{N = i}, for i=0, 1, 2,3, 4,5, 6, by simulation. 


ay 


[i 
j 
{ 
i 
į 
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14. With x, = 23, x, = 66, and 
Xa = 3xX,-1+5x,-2 mod(100), n>3 


we will call the sequence u,, = x,,/100, n > 1, the text’s random number sequence. 
Find its first 14 values. 
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Generating Discrete 
Random Variables 


4.1 The Inverse Transform Method 


Suppose we want to generate the value of a discrete random variable X having 
probability mass function 


P{X=x;} =p, j=0,1,...,} p;=1 
i 


To accomplish this, we generate a random number U—thht is, U is uniformly 
distributed over (0,1)—and set 


x, ifpyp<U<pot+p, 


bd 
li 


x; if Eh piısU< Llor: 


Since, for 0 < a < b < 1, P{a < U < b} =b — a, we have that 


j-1 i 
P{X = J=r|Ersu<5n} =r, 
i=0 i=0 


and so X has the desired distribution. 
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Remarks 
1. The preceding can be written algorithmically as 
Generate a random number U 
If U < pg set X = xq and stop 
If U < p+ p; set X = x, and stop 
If U < po + pı +p set X = x, and stop 


2. If the x;, i > 0, are ordered so that x) < x, < x, <--- and if we let F denote 
the distribution function of X, then F(x,) =‘, p; and so 


X will equal x; if F(x;.)<U< F(x) 


In other words, after generating a random number U we determine the value of 
X by finding the interval [F(x,_,), F(x;)) in which U lies [or, equivalently, by 
finding the inverse of F(U)]. It is for this reason that the above is called the 
discrete inverse transform method for generating X. Oo 


The mooi of time it takes to generate a discrete random variable by the 
above method is proportional to the number of intervals one must search. For 
this reason it is sometimes worthwhile to consider the possible values x; of X 
in decreasing order of the p;. 

Example 4a If we wanted to simulate a random variable X such that 
p, =0.20, p,=0.15, p3;=0.25, p,=0.40 where p; = P{X = j} 
then we could generate U and do the following: 
If U < 0.20 set X = 1 and stop 
If U < 0.35 set X =2 and stop 
If U < 0.60 set X =3 and stop 
Otherwise set X = 4 
However, a more efficient procedure is the following: 
If U < 0.40 set X = 4 and stop 
If U<0.65 set X =3 and stop 
If U < 0.85 set X = 1 and stop 
Otherwise set X =2 Oo 
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One case where it is not necessary to search for the appropriate interval 
in which the random number lies is when the desired random variable is the 
discrete uniform random variable. That is, suppose we want to generate the value 
of X which is equally likely to take on any of the values 1,...,n. That is, 
P{X = j}=1/n,j=1,...,n. Using the preceding results it follows that we can 
accomplish this by generating U and then setting 


1 ; 
J <u<? 
n n 


X=j if 


Therefore, X will equal j if j— 1 < nU < j; or, in other words, 
X =Int(nU)+1 
where Int(x)—-sometimes written as [x]}—is the integer part of x (i.e., the largest 
integer less than or equal to x). 
Discrete uniform random variables are quite important in simulation, as is 


indicated in the following two examples. 


Example 4b Generating a Random Permutation Suppose we are 


interested in generating a permutation of the numbers 1, 2, . . . , n which is such 
that all n! possible orderings are equally likely. The following algorithm will 
accomplish this by first choosing one of the numbers 1,...,” at random and 


then putting that number in position n; it then chooses at random one of the 
remaining n— 1 numbers and puts that number in position 7 — 1; it then chooses 
at random one of the remaining n — 2 numbers and puts it in position n — 2; and 
so on (where choosing a number at random means that each of the remaining 
numbers is equally likely to be chosen). However, so that we do not have to 
consider exactly which of the numbers remain to be positioned, it is convenient 
and efficient to keep the numbers in an ordered list and then randomly choose 
the position of the number rather than the number itself. That is, starting with any 
initial ordering P}, P,,..., P, we pick one of the positions 1,..., at random 
and then interchange the number in that position with the one in position n. 
Now we randomly choose one of the positions 1, ...,— 1 and interchange the 
number in this position with the one in position 7 — 1, and so on. 

Recalling that Int(kU) +1 will be equally likely to take on any of the values 
1,2,...,k, we see that the above algorithm for generating a random permutation 
can be written as follows: 


STEP 1: Let P}, P>,...,P, be any permutation of 1,2,...,” (e.g., we can 
choose P; = j, j=1,...,7). 

STEP 2: Setk=n. 

STEP 3: Generate a random number U and let J = Int(kU) +1. 

STEP 4: Interchange the values of P, and P}. 
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STEP 5: Letk=k—1andifk> 1 go to Step 3. 
ster 6: P,,...,P,, is the desired random permutation. 


For instance, suppose n = 4 and the initial permutation is 1, 2, 3, 4. If the first 
value of J (which is equally likely to be either 1, 2, 3, or 4) is Z =3, then the 
elements in positions 3 and 4 are interchanged and so the new permutation is 1, 
2, 4, 3. If the next value of J is J = 2, then the elements in positions 2 and 3 are 
interchanged and so the new permutation is 1, 4, 2, 3. If the final value of J is 
I = 2, then the final permutation is 1, 4, 2, 3, and this is the value of the random 
permutation. o 


One very important property of the preceding algorithm is that it can also 
be used to generate a random subset, say of size r, of the integers 1,...,n. 
Namely, just follow the algorithm until the positions n,n—1,...,n—r-+1 are 
filled. The elements in these positions constitute the random subset. (In doing 
this we can always suppose that r < n/2; for if r > n/2 then we could choose a 
random subset of size n — r and let the elements not in this subset be the random 
subset of size r.) 

It should be noted that the ability to generate a random subset is particularly 
important in medical trials. For instance, suppose that a medical center is planning 
to test a new drug designed to reduce its user’s blood cholesterol level. To test its 
effectiveness, the medical center has recruited 1000 volunteers to be subjects in 
the test. To take into account the possibility that the subjects’ blood cholesterol 
levels may be affected by factors external to the test (such as changing weather 
conditions), it has been decided to split the volunteers into two groups of size 
500—a treatment group that will be given the drug and a control that will be 
given a placebo. Both the volunteers and the administrators of the drug will not 
be told who is in each group (such a test is called double-blind). It remains to 
determine which of the volunteers should be chosen to constitute the treatment 
group. Clearly, one would want the treatment group and the control group to be 
as similar as possible in all respects with the exception that members in the first 
group are to receive the drug while those in the other group receive a placebo, 
for then it would be possible to conclude that any difference in response between 
the groups is indeed due to the drug. There is general agreement that the best 
way to accomplish this is to choose the 500 volunteers to be in the treatment 
group in a completely random fashion. That is, the choice should be made so 
that each of the peed subsets of 500 volunteers is equally likely to constitute 
the set of volunteers. 


Remarks Another way to generate a random permutation is to generate n 
random numbers U,,..., U,,, order them, and then use the indices of the succes- 
sive values as the random permutation. For instance, if n = 4, and U, = 0.4, U, = 
0.1, U; = 0.8, U, = 0.7, then, because U, < U, < U, < U}, the random permu- 
tation is 2, 1, 4, 3. The difficulty with this approach, however, is that ordering 
the random numbers typically requires on the order of nlog(n) comparisons. O 
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Example 4c Calculating Averages Suppose we want to approxi- 


mate Z= };_; a(i)/n, where n is large and the values a(i), i = 1,...,n, are 
complicated and not easily calculated. One way to accomplish this is to note that 
if X is a discrete uniform random variable over the integers 1, ..., n, then the 
random variable a(X) has a mean given by 

E n o adli) = 

E[a(X)] = X a(i)P{X =i}= >. as a 

i=l i=l 

Hence, if we generate k discrete uniform random variables X;, i = 1, . . . , k—by 


generating k random numbers U; and setting X; = Int(nU;) + 1—then each of the 
k random variables a(X;) will have mean Z, and so by the strong law of large 
numbers it follows that when k is large (though much smaller than n) the average 
of these values should approximately equal Z. Hence, we can approximate a by 
using 


k X. 
gay aed 7 


i=l 


Another random variable that can be generated without needing to search for 
the relevant interval in which the random number falls is the geometric. 


Example 4d Recall that X is said to be a geometric random variable with 
parameter p if 


P{X=i}=pq'', i=1, whereq=1-p 


X can be thought of as representing the time of the first success when independent 
trials, each of which is a success with probability p, are performed. Since 


SPX =} =1-P (X>j-1} 


i=] 
= 1 —P {first j— 1 trials are all failures} 
=1-q", j21 


we can generate the value of X by generating a random number U and setting 
X equal to that value j for which 


1—qgi!<U<1-q 
or, equivalently, for which 


qi <1-U <q 
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That is, we can define X by 
X =Min{j: q/ <1-U} 


Hence, using the fact that the logarithm is a monotone function, and so a < b is 
equivalent to log(a) < log(b), we obtain that X can be expressed as 


X = Min{j:j log(qg) < log(1 — U)} 


Lan f.. ._ log —U) 
=Min |j: j> loga) | 


where the last inequality changed sign because log(q) is negative for 0 < q < 1. 
Hence, using Int( ) notation we can express X as 


x=m (ECO) 41 


Finally, by noting that 1—U is also uniformly distributed on (0, 1), it follows 


that 
tre ( os 


is also geometric with parameter p. o 


Example 4e Generating a Sequence of Independent Bernoulli 
Random Variables Suppose that you want to generate n independent and 
identically distributed Bernoulli random variables X,,..., X,, with parameter p. 
While this is easily accomplished by generating n random numbers Uj,..., U, 
and then setting 


F if U;<p 

X= 

0, fU >p 

we will now develop a more efficient approach. To do so, imagine these random 
variables represent the result of sequential trials, with trial i being a success if 
X; = 1 or a failure otherwise. To generate these trials when p < 1/2, use the 
result of the Example 4d to generate the geometric random variable N, equal to 
the trial number of the first success when all trials have success probability $ 
Suppose the simulated value of N is N = j. If j> n, set X;=0,i=1,. 

if j <n, set X, =... = X; =0, X; = 1; and, if j < n, repeat the te 
operation to obtain the values of the remaining n— j Bernoulli random variables. 
(When p > 1/2, because we want to simultaneously generate as many Bernoulli 
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variables as possible, we should generate the trial number of the first failure 
rather than that of the first success.) 

The preceding idea can also be applied when the X, are independent but 
not identically distributed Bernoulli random variables. For each i= 1,...,7, 
let u; be the least likely of the two possible values of X;. That is, u; = 1 
if P(X, = =1}<1/2, and u; =0 otherwise. Also, let p; = P(X, = u;} and let 


.q; =1—p;. We will simulate the sequence of Bernoullis by first generating the 


value of X, where for j = 1,...,n, X will equal j when trial j is the first trial 
that results in an unlikely value, ‘and X will equal n+1 if none of the n trials 
results in its unlikely value. To generate X, let q,,, = 0 and note that 


j 
P(X > ji=[]|q-7=1,....0+1 


i=] 


Thus, 


i 
P{X < j}}=1-[]qa j=1,... 0+1 
i=1 A 
Consequently, we can simulate X by generating a random number, U, and then 
setting 


i=l 


i 
x=nin{j--=1-Ha} 


If X =n-+1, the simulated sequence of Bernoulli random variables is X; = 
l—u,,i=1,...,n.1f X=j,j <n, set X; =1l—u,i=1,...,j-1,X;=u,;; 
if j < n then generate the remaining values X,,,,...,X, in a similar fashion. 


4.2 Generating a Poisson Random Variable 


The random variable X is Poisson with mean A if 
ri 
=P=} =e i=0,1,... 


The key to using the inverse transform method to generate such a random variable 
is the following identity (proved in Section 2.8 of Chapter 2): 


A 
Pe 720 (4.1) 
Upon using the above recursion to compute the Poisson probabilities as they 
become needed, the inverse transform algorithm for generating a Poisson random 
variable with mean A can be expressed as follows. (The quantity i refers to the 
value presently under consideration; p = p; is the probability that X equals i, 
and F = F(i) is the probability that X is less than or equal to i.) 


Piz = 
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STEP 1: Generate a random number U. 

STEP 2: i=0,p=e"%,F=p. 

STEP 3: If U<F, set X =i and stop. 
STEP 4: p=Ap/(i+1),F=F+p,i=i+1. 
STEP 5: Go to Step 3. 


(In the above it should be noted that when we write, for example, i=i+1, 
we do not mean that i is equal to i+ 1 but rather that the value of i should be 
increased by 1.) To see that the above algorithm does indeed generate a Poisson 
random variable with mean A, note that it first generates a random number U 
and then checks whether or not U < eA = Po. TÊ so, it sets X = 0. If not, then 
it computes (in Step 4) p, by using the recursion (4.1). It now checks whether 
U < pọ + p; (where the right-hand side is the new value of F), and if so it sets 
X = 1, and so on. 

The above algorithm successively checks whether the Poisson value is 0, then 
whether it is 1, then 2, and so on. Thus, the number of comparisons needed will 
be 1 greater than the generated value of the Poisson. Hence, on average, the 
above will need to make 1+ A searches. Whereas this is fine when A is small, it 
can be greatly improved upon when A is large. Indeed, since a Poisson random 
variable with mean A is most likely to take on one of the two integral values 
closest to A, a more efficient algorithm would first check one of these values, 
rather than starting at 0 and working upward. For instance, let J = Int(A) and use 
Equation (4.1) to recursively determine F(J). Now generate a Poisson random 
variable X with mean A by generating a random number U, noting whether or 
not X < I by seeing whether or not U < F(J). Then search downward starting 
from J in the case where X < I and upward starting from J +1 otherwise. 

The number of searches needed by this algorithm is roughly 1 more than the 
absolute difference between the random variable X and its mean A. Since for 
A large a Poisson is (by the central limit theorem) approximately normal with 
mean and variance both equal to A, it follows that 


Average number of searches ~ 1+ E[|X — A[] where X ~ N(A, A)* 


=1+Vie| 2") 


=1+4+¥VAE[|Z|] where Z ~ N(0, 1) 
=1+0.798/A (see Exercise 11) 


That is, using Algorithm 4-1, the average number of searches grows with the 
square root of A rather than with A as A becomes larger and larger. 


*We use the notation X ~ F to mean that X has distribution function F. The symbol N(z, o?) 
stands for a normal distribution with mean yz and variance a”. 
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4.3 Generating Binomial Random Variables 


Suppose we want to generate the value of a binomial (n, p) random variable 
X—that is, X is such that 


: n! ; P 
se ae Soe p A , i=O,1,...,” 


To do so, we employ the inverse transform method by making use of the recursive 
identity f 


n—i p 


Prci gs 


P{X = i} 


With i denoting the value currently under consideration, pr = P{X = i} the 
probability that X is equal to i, and F = F(i) the probability that X is less than 
or equal to i, the algorithm can be expressed as follows: 


Inverse Transform Algorithm for Generating a Binomial 
(n, p) Random Variable 


STEP 1: Generate a random number U. 

STEP 2: c=p/(1—p),i=0, pr=(1—p)", F =pr. 
STEP 3: If U<F, set X =i and stop. 

STEP 4: pr=[c(n—i)/G+1)]pr, F=F +pr,i=i+l. 
STEP 5: Go to Step 3. 


The preceding algorithm first checks whether X = 0, then whether X = 1, 
and so on. Hence, the number of searches it makes is 1 more than the value 


‘of X. Therefore, on average, it will take 1+-np searches to generate X. Since 


a binomial (n, p) random variable represents the number of successes in n 
independent trials when each is a success with probability p, it follows that such 
a random variable can also be generated by subtracting from n the value of a 
binomial (7, 1 — p) random variable (why is that?). Hence, when p > i, we can 
generate a binomial (n, 1— p) random variable by the above method and subtract 
its value from n to obtain the desired generation. 


Remarks 


1. Another way of generating a binomial (n, p) random variable X is by utiliz- 
ing its interpretation as the number of successes in n independent Bernoulli 
trials, when each trial is a success with probability p. Consequently, we can 
also simulate X by generating the outcomes of these n Bernoulli trials. 
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2. As in the Poisson case, when the mean np is large it is better to first 
determine if the generated value is less than or equal to J =Int(np) or 
whether it is greater than J. In the former case one should then start the 
search with J, then J—1,..., and so on; whereas in the latter case one 
should start searching with J+ 1 and go upward. o 


4.4 The Acceptance-Rejection Technique 


Suppose we have an efficient method for simulating a random variable having 
probability mass function {q;, j => 0}. We can use this as the basis for simulating 
from the distribution having mass function {p;, j > 0} by first simulating a 
random variable Y having mass function {q;} and then accepting this simulated 
value with a probability proportional to py/qy. 

Specifically, let c be a constant such that 


Pi <c for all j such that p; > 0 (4.2) 
qj 


We now have the following technique, called the rejection method or the 
acceptance—rejection method, for simulating a random variable X having mass 
function p; = P{X = j}. 


Rejection Method 


STEP 1: Simulate the value of Y, having probability mass function q;. 
STEP 2: Generate a random number U. 
STEP 3: If U< py/cqy, set X = Y and stop. Otherwise, return to Step 1. 


The rejection method is pictorially represented in Figure 4.1. 
We now prove that the rejection method works. 


Theorem The acceptance-rejection algorithm generates a random variable 
X such that 


P{(X=j}=p,;,, j=0,... 


Start Yes 


Generate Y with Generate 
mass function q; U 


Is USpyleqy 


Figure 4.1. Acceptance-rejection. 
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In addition, the number of iterations of the algorithm needed to obtain X is a 
geometric random variable with mean c.- 


Proof To begin, let us determine the probability that a single iteration 
produces the accepted value j. First note that 


P{Y = j, it is accepted} = P{Y = j}P{accept|Y = j} 
Pj 
ret 
e 
Summing over j yields the probability that a generated random variable is 
accepted: 


; 1 

P{accepted} = }> Basie 

ye c 
As each iteration independently results in an accepted value with probability 
1/c, we see that the number of iterations needed is geometric with mean c. Also, 


P{X = j} = >> P{j accepted on iteration n} 
=da-yen 


c 


Remark The reader should note that the way in which we “accept the value 
Y with probability py/cqy” is by generating a random number U and then 
accepting Y if U < py/cqy. 


Example 4f Suppose we wanted to simulate the value of a random variable 
X that takes one of the values 1,2,...,10 with respective probabilities 0.11, 
0.12, 0.09, 0.08, 0.12, 0.10, 0.09, 0.09, 0.10, 0.10. Whereas one possibility is 
to use the inverse transform algorithm, another approach is to use the rejec- 
tion method with q being the discrete uniform density on 1,..., 10. That is, 
q;= 1/10, 7 =1,..., 10. For this choice of {q;} we can choose c by 


, j 
and so the algorithm would be as follows: 
STEP 1: Generate a random number U, and set Y = Int(10U,) +1. 
STEP 2: Generate a second random number U}. 
STEP 3: If U, < py/.12, set X = Y and stop. Otherwise return to Step 1. 


, 


| 
| 
l 
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The constant 0.12 in Step 3 arises since cqy = 1.2/10 = 0.12. On average, this 
algorithm requires only 1.2 iterations to obtain the generated value of X. o 


The power of the rejection method, a version of which was initially proposed 
by the famous mathematician John von Neumann, will become even more readily 
apparent when we consider its analogue when generating continuous random 
variables. 


4.5 The Composition Approach 


Suppose that we had an efficient method to simulate the ypu of a random variable 
having either of the two probability mass functions {p ) j>0}or{ De. j=}, 
and that we wanted to simulate the value of the random variable X having mass 
function 


P{X = j}=ap+(1-a)p”, j20 (4.3) 


where 0 < a < 1. One way to simulate such a random variable X is to pale that 
if X, and X, are random variables having respective mass functions {p! Dj 3) and 
{p et then the random variable X defined by 


x= X, with probability a 
~ |X, with probability 1 — æ 


will have its mass function given by (4.3). From this it follows that we can 
generate the value of such a random variable by first generating a random number 
U and then generating a value of X, if U < a and of X, if U> æ. 


Example 4g Suppose we want to generate the value of a random variable 
X such that 


. 0.05 for j=1,2,3,4,5 
=P | for j =6,7,8,9, 10 
By noting that p; = 0. 5p +0.5 pP, where 


p? Pra (2) 0 for j = 1,2,3,4,5 
al a E E at for j = 6,7, 8,9, 10 


we can accomplish this by first generating a random number U and then gen- 
_ erating from the discrete uniform over 1,...,10 if U < 0.5 and from the 
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discrete uniform over 6, 7, 8, 9, 10 otherwise. That is, we can simulate X as 
follows: 


STEP 1: Generate a random number Uj. 
STEP 2: Generate a random number U,. 
sTeP 3: If U, <0.5, set X = Int(10U,) +1. Otherwise, 
set X = Int(5U,) +6. q 


If F;,i=1,..., are distribution functions and a;,i=1,..., n, are nonnegative 
numbers summing to 1, then the distribution function F given by 


F(a) = Da F(x) 


is said to be a mixture, or a composition, of the distribution functions F;, i = 
1,...,”. One way to simulate from F is first to simulate a random variable 
I, equal to i with probability a;,i=1,...,n, and then to simulate from the 
distribution F;. (That is, if the simulated value of J is J = j, then the second 
simulation is from F;.) This approach to simulating from F is often referred to 
as the composition method. 


4.6 Generating Random Vectors 


A random vector X,,..., X, can be simulated by sequentially generating the X;. 
That is, first generate X,; then generate X, from its conditional distribution given 
the generated value of X,; then generate X, from its conditional distribution given 
the generated values of X, and X,; and so on. This is illustrated in Example 4h, 
which shows how to simulate a random vector having a multinomial distribution. 


Example 4h Considern independent trials, each of which results in one of the 


outcomes 1,2,..., r with respective probabilities p,, p),..., Pro Xim Pi = 1. If 
X; denotes the number of trials that result in outcome i, then the random vector 
(X,,...,X,) is said to be a multinomial random vector. Its joint probability 
mass function is given by 


n! 


r 
r}=—— pr, basn 
Xr 


P{X,=x,i=1,..., 
{Xj = x, i a 2 


The best way to simulate such a random vector depends on the relative sizes of 
r and n. If-r is large relative to n, so that many of the outcomes do not occur 
on any of the trials, then it is probably best to simulate the random variables 
by generating the outcomes of the n trials. That is, first generate independent 
random variables Y,,..., ¥, such that 


P{Y, =i} = p;, ie Re j=l,...,n, 
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and then set 
X; = number of j, j=1,...,n: Y =i 


(That is, the generated value of Y, represents the result of trial j, and X; is the 
number of trials that result in outcome i.) 

On the other hand, if n is large relative to r, then X,,..., X, can be simulated 
in sequence. That is, first generate X,, then X,, then X,, and so on. Because each 
of the z trials independently results in outcome 1 with probability p,, it follows 
that X, is a binomial random variable with parameters (n, p,). Therefore, we 
can use the method of Section 4.3 to generate X,. Suppose its generated value is 
x,. Then, given that x, of the n trials resulted in outcome 1, it follows that each 
of the other n — x, trials independently results in outcome 2 with probability 


= 
— Pi 


Therefore, the conditional distribution of X,, given that X, = x,, is binomial with 
parameters (n —x,, ;”-). Thus, wecan again make use of Section 4.3 to generate 
the value of X,. If the generated value of X, is x,, then we next need to generate 
the value of X, conditional on the results that X, = x,, X, = X,. However, given 
there are x, trials that result in outcome 1 and x, trials that result in outcome 2, 
each of the remaining n — x, — x, trials independently results in outcome 3 with 
probability ;— z à ars . Consequently, the conditional ee of X, given that 


P{2\|not 1} = 


A;=x,i=1, 4, is binomial with parameters (n — x; — X2, = = T -—=—). We then use 
this fact to generate X}, and continue on until all the values X,,...,X, have 
been generated. o 


Exercises 


1. Write a program to generate n values from the probability mass function 
1 2 
Py = 3) P2 = 3- 


(a) Let n = 100, run the program, and determine the proportion of values 
that are equal to 1. 

(b) Repeat (a) with n = 1000. 

(c) Repeat (a) with n = 10,000. 


2. Write a computer program that, when given a probability mass function 
{p;,j/=1,...,mn} as an input, gives as an output the value of a random variable 
having this mass function. 


3. Give an efficient algorithm to simulate the value of a random variable X 
such that 


P{X=1}=0.3,° P{X=2}=0.2, P{X=3}=0.35, P{X=4}=0.15 
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4. A deck of 100 cards—numbered 1,2,..., 100—is shuffled and then turned 
over one card at a time. Say that a “hit” occurs whenever card i is the ith card 
to be turned over, i= 1,..., 100. Write a simulation program to estimate the 
expectation and variance of the total number of hits. Run the program. Find the 
exact answers and compare them with your estimates. 


5. Another method of generating a random permutation, different from the one 
presented in Example 4b, is to successively generate a random permutation of 
the elements 1,2,...,n starting with n = 1, then n = 2, and so on. (Of course, 
the random permutation when z = 1 is 1.) Once one has a random permutation 
of the first n — 1 elements—call it P,,..., P,_;—-the random permutation of the 
n elements 1,...,7 is obtained by putting n in the final position—to obtain the 
permutation P,,..., P,_;,—and then interchanging the element in position n 
(namely, n) with the element in a randomly chosen position which is equally 
likely to be either position 1, position 2,..., or position n. 


(a) Write an algorithm that accomplishes the above. 

(b) Prove by mathematical induction on n that the algorithm works, in that 
the permutation obtained is equally likely to be any of the n! permutations 
of 1,2,...,n 


6. Using an efficient procedure, along with the text’s random number sequence, 
generate a sequence of 25 independent Bernoulli random variables, each having 
parameter p = .8. How many random numbers were needed? 


7. A pair of fair dice are to be continually rolled until all the possible outcomes 
2,3,..., 12 have occurred at least once. Develop a simulation study to estimate 
the expected number of dice rolls that are needed. 


8. Suppose that each item on a list of n items has a value attached to it, and 
let v(i) denote the value attached to the ith item on the list. Suppose that n is 
very large, and also that each item may appear at many different places on the 


‘list. Explain how random numbers can be used to estimate the sum of the values 


of the different items on the list (where the value of each item is to be counted 
once no matter how many times the item appears on the list). 


9. Consider the n events A,,...,A, where A; consists of the following 7; 
outcomes: A; = {a; 1,4; 9,-.. Gi, Suppose that for any given outcome a, 
P{a}, the probability that the experiment results in outcome a is known. Explain 
how one can use the results of Exercise 8 to estimate P{\7_, A;}, the probability 
that at least one of the events A; occurs. Note that the events A; i=l,...,n, 
are not assumed to be mutually exclusive. 


10. The negative binomial probability mass function with parameters (r, p), 
where r is a positive integer and 0 < p < 1, is given by 


G-1! 


= ——= p"(1—-p)", j=r, i ee 


Pj 


i 
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(a) Use the relationship between negative binomial and geometric random 
variables and the results of Example 4d to obtain an algorithm for 
simulating from this distribution. : 

(b) Verify the relation 


-= _ i-p) 
jh j+1-r i 


(c) Use the relation in part (b) to give a second algorithm for generating 
negative binomial random variables. 

(d) Use the interpretation of the negative binomial distribution as the 
number of trials it takes to amass a total of r successes when each 
trial independently results in a success with probability p, to obtain 
still another approach for generating such a random variable. 


11. If Z is a standard normal random variable, show that 


E{|Z{|= E) = 0.798 


12. Give two methods for generating a random variable X such that 


A yi ss 
Fp ee ey ea La E E 2 
Djnoe*A/j! 


13. Let X be a binomial random variable with parameters n and p. Suppose 
that we want to generate a random variable Y whose probability mass function 
is the same as the conditional mass function of X given that X > k, for some 
k <n. Let œ = P{X > k} and suppose that the value of œ has been computed. 


(a) Give the inverse transform method for generating Y. 

(b) Give a second method for generating Y. 

(c) For what values of a, small or large, would the algorithm in (b) be 
inefficient? 


14. Give a method for simulating X, having the probability mass function 
Pp j=5,6,...,14, where 


e 0.11. when j is odd and 5 < j < 13 
Pi™= 10.09 when j is even and 6 < j < 14 


Use the text’s random number sequence to generate X. 
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15. Suppose that the random variable X can take on any of the values 1,..., 10 
with respective probabilities 0.06, 0.06, 0.06, 0.06, 0.06, 0.15, 0.13, 0.14, 0.15, 
0.13. Use the composition approach to give an algorithm that generates the value 
of X. Use the text’s random number sequence to generate X. 


16. Present a method to generate the value of X, where 


1\! (D92 
Px=)=(3) +O JEL, 2.65. 


Use the text’s random number sequence to generate X. 


17. Let X have mass function p; = P{X = j}, 2, pj = 1. Let 


Ay = P(X = n|X > n—1} = —22__, n= lan 


(a) Show that p, = À; and 
Pnr = (1 SF Aya = Àa) TET (i — Àn-1)Àn 


The quantities A,,, > 1, are called the discrete hazard rates, since if 
we think of X as the lifetime of some item then A, represents the 
probability that an item that has reached the age n will die during 
that time period. The following approach to simulating discrete random 
variables, called the discrete hazard rate method, generates a succession 
of random numbers, stopping when the nth random number is less than 
A, The algorithm can be written as follows: 


STEP 1: X=1. 

STEP 2: Generate a random number U. 
STEP 3: If U< Ày, stop. 

STEP 4: X=X+1. 

STEP 5: Go to Step 2. 


(b) Show that the value of X when the above stops has the desired mass 
function. 

(c) Suppose that X is a geometric random variable with parameter p. Deter- 
mine the values A,,, 2 > 1. Explain what the above algorithm is doing 
in this case and why its validity is clear. 


18. Suppose that 0 < A, <A, for all n > 1. Consider the following algorithm to 
generate a random variable having discrete hazard rates {A,}. 


STEP 1: S=0. 


STEP 2: Generate U and set Y = Int (235) +1. 


| 
| 
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STEP 3: S=S+Y. 
STEP 4: Generate U. 
STEP 5: IfU<As/A, set X = S and stop. Otherwise, go to 2. 


(a) What is the distribution of Y in Step 2? 
(b) Explain what the algorithm is doing. 
(c) Argue that X is a random variable with discrete hazard rates {A,}. 


19. A random selection of m balls is to be made from an urn that contains n 
balls, n; of which have color type i, $o; n; = n. Discuss efficient procedures 
for simulating X,,...,X,, where X; denotes the number of withdrawn balls that 
have color type i. 


Generating Continuous 
Random Variables 


Introduction 


Each of the techniques for generating a discrete random variable has its analogue 
in the continuous case. In Sections 5.1 and 5.2 we present the inverse transform 
approach and the rejection approach for generating continuous random variables. 
In Section 5.3 we consider a powerful approach for generating normal random 
variables, known as the polar method. Finally, in Sections 5.4 and 5.5 we consider 
the problem of generating Poisson and nonhomogeneous Poisson processes. 


5.1 The Inverse Transform Algorithm 
Consider a continuous random variable having distribution function F. A general 
method for generating such a random variable—called the inverse transformation 
method—is based on the following proposition. 


Proposition Let U be a uniform (0, 1) random variable. For any continuous 
distribution function F the random variable X defined by 


X=F"(U) 
has distribution F. [F~'(u) is defined to be that value of x such that F(x) = u.] 
Proof Let F y denote the distribution function of X = F~!(U). Then 
F(x) = P{X < x} 


=P{F (U) <x} 65.1) 


67 
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Now since F is a distribution function it follows that F(x) is a monotone 
increasing function of x and so the inequality “a < b” is equivalent to the 
inequality “F(a) < F(b).” Hence, from Equation (5.1), we see that 


Fy (x) = P{F(F~'(U)) < F(x)} 
= P{U < F(x)} since F(F~'(U)) = U 
= F(x) since U is uniform(0, 1) o 
The above proposition thus shows that we can generate a random variable X 
_ from the continuous distribution function F by generating a random number U 
and then setting X = F~!(U). 


Example 5a Suppose we wanted to generate a random variable X having 
distribution function 


F(x)=x", O<x<1 
If we let x = F~! (u), then 
u=F(x)=x" or, equivalently, x= u!" 


Hence we can generate such a random variable X by generating a random number 
U and then setting X =U". o 


The inverse transform method yields a powerful approach to generating expo- 
nential random variables, as is indicated in the next example. 


Example 5b If X is an exponential random variable with rate 1, then its 
distribution function is given by 


F(x) =1—e* 
If we let x = F~! (u), then 
u = F(x) = 1 — e™ 
or 


1—u=e™~ 
or, taking logarithms, 


x = —log(1—u) 
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Hence we can generate an exponential with parameter 1 by generating a random 
number U and then setting 


X = F7 (U) = —log(1 — U) 


A small savings in time can be obtained by noting that 1 — U is also uniform on 


(0, 1) and thus —log(1— U) has the same distribution as — log U. That is, the 


negative logarithm of a random number is exponentially distributed with rate 1. 

In addition, note that if X is exponential with mean 1 then, for any positive 
constant c, cX is exponential with mean c. Hence, an exponential random variable 
X with rate A (mean 1/A) can be generated by generating a random number U 
and setting 


x= Liet m 


Remark The above also provides us with another algorithm for generating 
a Poisson random variable. To begin, recall that a Poisson process with rate À 
results when the times between successive events are independent exponentials 
with rate À. (See Section 2.9 of Chapter 2.) For such a process, N(1), the number 
of events by time 1, is Poisson distributed with mean A. However, if we let 
X,,i=1,..., denote the successive interarrival times, then the nth event will 
occur at time }~7_, X;, and so the number of events by time 1 can be expressed as 


No) =Max fn 5x <1] 


That is, the number of events by time 1 is equal to the largest n for which the 


_ nth event has occurred by time 1. (For example, if the fourth event occurred 


by time 1 but the fifth event did not, then clearly there would have been a 
total of four events by time 1.) Hence, using the results of Example 5b, we 
can generate N = N(1), a Poisson random variable with mean A, by generating 
random numbers U,,...,U,,... and setting 


n 


1 
N = Max {mS ~ joe 1 


i=] 


i=l 


= Max fr X logU;, > -al 


= Max{n: log(U, --- U,) > —A} 
= Max{n: U, -+ U, > e°} 
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Hence, a Poisson random variable N with mean A can be generated by succes- 
sively generating random numbers until their product falls below e~*, and then 
setting N equal to 1 less than the number of random numbers required. That is, 


N=Min{n: U,---U, <e“}—1 o 


The results of Example 5b along with the relationship between the gamma and 
the exponential distribution can be used to efficiently generate a gamma (n, A) 
random. variable. 


Example 5c Suppose we wanted to generate the value of a gamma (n, A) 
` random variable. Since the distribution function F of such a random variable is 
given by 


z he (Ay) 
(n—1)! 


it is not possible to give a closed form expression for its inverse. However, 
by using the result that a gamma (n, A) random variable X can be regarded as 
being the sum of n independent exponentials, each with rate À (see Section 2.3 
of Chapter 2), we can make use of Example 5b to generate X. Specifically, we 
can generate a gamma (n, A) random variable by generating n random numbers 
U,,...,U,, and then setting 


F(x) =f dy 


1 1 
A= log U, —--- z 8U, 


À 


1 
= —5 log(U --- U,) 
where the use of the identity $; log x; = log(x, ---x,) is computationally time 
saving in that it requires only one rather than n logarithmic computations. O 


The results of Example 5c can be used to provide an efficient way of generating 
a set of exponential random variables by first generating their sum and then, 
conditional on the value of that sum, generating the individual values. For 
example, we could generate X and Y, a pair of independent and identically 
distributed exponentials having mean 1, by first generating X + Y and then using 
the result (Exercise 36 of Chapter 2) that, given that X + Y = t, the conditional 
distribution of X is uniform on (0, £). The following algorithm can thus be used 
to generate a pair of exponentials with mean 1. 


STEP 1: Generate random numbers U, and Uj. 
STEP 2: Set t= —log(U,U,). 

STEP 3: Generate a random number U}. 

STEP 4: X=1U;,Y =t—X. 
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Comparing the above with the more direct approach of generating two random 
numbers U, and U, and then setting X = — log U,, Y = — log U, shows that the 
above algorithm saves a logarithmic computation at the cost of two multiplica- 
tions and the generation of a random number. 

We can also generate k independent exponentials with mean 1 by first generat- 
ing their sum, say by —log(U, --- U,), and then generating k — 1 additional ran- 


‘dom numbers U,,..., U,_;, which should then be ordered. If Uy) < Ug < 


< Uys) are their ordered values, and if —log(U, ---U,) = t, then the k expo- 
nentials are 


Uy pn U;i-1)l» i= l, 2, ...3 k, where Uo) = 0, Ur = 1 


5.2 The Rejection Method 


Suppose we have a method for generating a random variable having density 

function g(x). We can use this as the basis for generating from the continuous 

distribution having density function of f(x) by generating Y from g and then 

accepting this generated value with a probability proportional to f(Y)/g(Y). 
Specifically, let c be a constant such that 


O se for all y 


We then have the following technique (illustrated in Figure 5.1) for generating 
a random variable having density f. 


The Rejection Method 


STEP 1: Generate Y having density g. 
STEP 2: Generate a random number U. 


STEP 3: IfU< T set X = Y. Otherwise, return to Step 1. 


Start 


Generate a 
random number 
U 


Generate 
Y~g 


Figure 5.1. The rejection method for simulating a random variable X having density 
function f. 
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The reader should note that the rejection method is exactly the same as in the 
case of discrete random variables, with the only difference being that densities 
replace mass functions. In exactly the same way as we did in the discrete case 
we can prove the following result. 


Theorem 
(i) The random variable generated by the rejection method has density f. 
(ii) The number of iterations of the algorithm that are needed is a geometric 


random variable with mean c. 


As in the discrete case it should be noted that the way in which one accepts 
the value Y with probability f(Y)/cg(Y) is by generating a random number U 
and then accepting Y if U < f(Y)/cg(Y). 


Example 5d Let us use the rejection method to generate a random variable 
having density function 


f(x) =20x(1—x)*, O<x<1 


Since this random variable (which is beta with parameters 2, 4) is concentrated 
in the interval (0, 1), let us consider the rejection method with 


g(x)=1, O<x<1 


To determine the smallest constant c such that f(x)/g(x) < c, we use calculus 
to determine the maximum value of 


F(x) 


= = 20x(1—x)* 
Ce 
Differentiation of this quantity yields 
d (f(x) 3 2 
— | —— | = 20 [A -x — — 
£ (483) =2 [0-3-3-7] 
Setting this equal to 0 shows that the maximal value is attained when x = 7 and 
thus 
A) 9g (2) (3)? _ 335 
a(x) aU 64 
Hence, 
f(x) 256 3 
= —x(1- 
cg(x) 27 KN 
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and thus the rejection procedure is as follows: 


STEP 1: Generate random numbers U, and Uj. 
STEP 2: IfU,< UL (1 —U,)*, stop and set X = U,. Otherwise, return to 


Step 1. 
The average number of times that Step 1 will be performed is c = = #211. o 


Example 5e Suppose we wanted to generate a random variable having the 


gamma (3, 1) density 
f(x) = Kx'?e*, x>0 


where K = 1/I'(3) =2/./77. Because such a random variable is concentrated on 
the positive axis and has mean $, it is natural to try the rejection technique with 
an exponential random variable with the same mean. Hence, let 


->33 x>0 


g(x) =e 


Now 


F(x) = 3K 1/2 ,-x/3 
ax) 2. 


By differentiating and setting the resultant derivative equal to 0, we find that the 
maximal value of this ratio is obtained when 


1 1 
Leh n 5 perk 


that is, when x = 3. Hence 


_ f(x) E 3K [3\ —1/2 
c ee (x) =a (3 e 
33/2 . 
Ona since K =2/./7 
Since 
a = (26 /3) "2x12 en 


we see that a gamma (2, 1) random variable can be generated as follows: 


STEP 1: Generate a random number U, and set Y = -$ log U,. 
STEP 2: Generate a random number U}. 
STEP 3: If U, < (2eY/3)/e-*, set X = Y. Otherwise, return to Step 1. 
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The average number of iterations that will be needed is 
3 \12 . 
c=3 (=) = 1.257. o 
We 


In the previous example, we generated a gamma random variable using the 
rejection approach with an exponential distribution having the same mean as 
the gamma. It turns out that this is always the most efficient exponential to use 
when generating a gamma random variable. To verify this, suppose we want to 
generate a random variable having density function 


f(x) =Ke*x*", x>0 


where A > 0,a > 0, and K = A“/T(q@). The preceding is the density function 
of a gamma random variable with parameters œ and A and is known to have 
mean a/d. 

Suppose we plan to generate the preceding type random variable by the 
rejection method based on the exponential density with rate u. Because 


f(x)  Ke™*xt! K 


g(x) pe p 


xæ! etx 


we see that when 0<a <1 
x0 g(x) 
thus showing that the rejection technique with an exponential can not be used in 


this case. As the gamma density reduces to the exponential when a = 1, let us 
suppose that a > 1. Now, when u > A 


mn 1) 9, 
moe) 


and so we can restrict attention to values of y that are strictly less than A. With 
such a value of u, the mean number of iterations of the algorithm that will be 
required is 
K 
c(h) = Max = Max— x! eH- 
=e) Fp 
To obtain the value of x at which the preceding maximum occurs, we differ- 
entiate and set equal to 0 to obtain 


0= (@ ‘se Lx tee ae (A = p)x* ee 


N 
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yielding that the maximum occurs at 
a—l 
im 
A—p 


a—l 
cu) = (4) cont 
u \A—p 


Hence, the value of u that minimizes c(w) is that value that maximizes 
p(A—p)*1. Differentiation gives 


(a= wt} = =n = (a= a=)" 
Setting the preceding equal to 0 yields that the best value of u satisfies 
A-p=(a-l)p 
or 
p=A/a 


That is, the exponential that minimizes the mean number of iterations needed 
by the rejection method to generate a gamma random variable with parameters 


_@ and A has the same mean as the gamma; namely, a/A. 


Our next example shows how the rejection technique can be used to generate 
normal random variables. 


Example 5f Generating a Normal Random Variable To gener- 
ate a standard normal random variable Z (i.e., one with mean 0 and variance 1), 
note first that the absolute value of Z has probability density function 


2 
NVT 


We start by generating from the preceding density function by using the rejection 
method with g being the exponential density function with mean 1—that is, 


e*? Q<x<% (5.2) 


f(x) = 


-x 


g(x)=e* 0<x< œ 
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Now 


f(x) / exe /2 
g(x) afar 


and so the maximum value of f(x)/g(x) occurs at the value of x that max- 
imizes x— x?/2. Calculus shows that this occurs when x = 1, and so we 
can take 


-Maf = f0 _ naz 
po say 2em 


Because 


st 2 


it follows that we can generate the absolute value of a standard normal random 
variable as follows: 


STEP 1: Generate Y, an exponential random variable with rate 1. 
STEP 2: Generate a random number U. 
sTEP 3: If U <exp{—(Y —1)?/2}, set X = Y. Otherwise, return to Step 1. 


Once we have simulated a random variable X having density function as in 
Equation (5.2)—and such a random variable is thus distributed as the absolute 
value of a standard normal—we can then obtain a standard normal Z by letting 
Z be equally likely to be either X or —X. 

In Step 3, the value Y is accepted if U < exp{—(Y¥ —1)”/2}, which is equivalent 
to — log U > (Y — 1)?/2. However, in Example 5b it was shown that — log U is 
exponential with rate 1, and so the above is equivalent to the following: 


STEP 1: Generate independent exponentials with rate 1, Y, and Y}. 
sTEP 2: If Y, > (Y¥,—1)?/2, set X = Y,. Otherwise, return to Step 1. 


Suppose now that the foregoing results in Y, being accepted—and so we know 
that F, is larger than (Y, — 1)*/2. By how much does the one exceed the other? 
To answer this, recall that Y, is exponential with rate 1, and so, given that 
it exceeds some value, the amount by which Y, exceeds (Y, —1)?/2 [i.e., its 
“additional life” beyond the time (Y, — 1)?/2] is (by the memoryless property) 
also exponentially distributed with rate 1. That is, when we accept in Step 2 
not only do we obtain X (the absolute value of a standard normal) but by 
computing Y, — (Y, —1)?/2 we can also generate an exponential random variable 
(independent of X) having rate 1. 
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Hence, summing up, we have the following algorithm that generates an expo- 
nential with rate 1 and an independent standard normal random variable. 


STEP 1: Generate Y,, an exponential random variable with rate 1. 

STEP 2: Generate Y,, an exponential random variable with rate 1. 

step 3: If ¥,—(¥,—1)?/2>0, set ¥ = Y, — (Y, — 1)?/2 and go to Step 4. 
Otherwise, go to Step 1. 

STEP 4: Generate a random number U and set 


i Y iU<} 


The random variables Z and Y generated by the foregoing are independent with 
Z being normal with mean 0 and variance 1 and Y being exponential with rate 
1. (If you want the normal random variable to have mean p and variable o°, 
just take 4+ oZ.) m) 


Remarks 


1. Since c = /2e/m7 7% 1.32, the foregoing requires a geometric distributed 
number of iterations of Step 2 with mean 1.32. 

2. If we want to generate a sequence of standard normal random variables, 
we can use the exponential random variable Y obtained in Step 3 as the 
initial exponential needed in Step 1 for the next normal to be generated. 
Hence, on the average, we can simulate a standard normal by generating 
1.64(=2 x 1.32 — 1) exponentials and computing 1.32 squares. 

3. The sign of the standard normal can be determined without generating a new 
random number (as in Step 4). The first digit of an earlier random number 
can be used. That is, an earlier random number r}, 7,..., rp Should be used 
aS T2, T3,- - - , 7, with r; being used to determine the sign. i 


The rejection method is particularly useful when we need to simulate a random 
variable conditional on it being in some region. This is indicated by our next 
example. 


Example 5g Suppose we want to generate a gamma (2, 1) random variable 
conditional on its value exceeding 5. That is, we want to generate a random 
variable having density function 


xe* xe™ 
a E 
f,xerdx 6e>° 


f(x) = 


where the preceding integral was evaluated by using integration by parts. Because 
a gamma (2, 1) random variable has expected value 2, we will use the rejection 
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method based on an exponential with mean 2 that is conditioned to be at least 5. 
That is, we will use 


1e-x/2 
a 
a(x) = e73/2 , x>5 


Now, 


Because xe~*/? is a decreasing function of x when x > 5, it follows that the 
number of iterations needed in the algorithm will be geometric with mean 


Max (FO) 9) _ 
c= Max |} FC ae 


To generate an exponential with rate 1/2 that is conditioned to exceed 5, we use 
the fact that the amount by which it exceeds 5 is (by the lack of memory property 
of exponential random variables) also exponential with rate 1/2. Therefore, if X 
is exponential with rate 1/2, it follows that 5+ X has the same distribution as 
does X conditioned to exceed 5. Therefore, we have the following algorithm to 
simulate a random variable X having density function f. 


STEP 1: Generate a random number U. 
STEP 2: Set Y=5—2log(U). 
STEP 3: Generate a random number U. 
step 4: If U < <-Ye-™”, set X = Y and stop; otherwise return to 
step 1. o 


Just as we simulated a normal random variable in Example 5f by using the 
rejection method based on an exponential random variable, we can also effec- 
tively simulate a normal random variable that is conditioned to lie in some 
interval by using the rejection method based on an exponential random vari- 
able. The details (including the determination of the best exponential mean) are 
illustrated in Section 8.8. 


5.3 The Polar Method for Generating Normal Random 
Variables 


Let X and Y be independent standard normal random variables and let R and 0 
denote the polar coordinates of the vector (X, Y). That is (see Figure 5.2), 


R=% +r 


Y 
tan 0 = — 
X 
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Figure 5.2. Polar Coordinates. 


Since X and Y are independent, their joint density is the product of their indi- 
vidual densities and is thus given by 


1 2 
fe) = ae ae 


29 V29 
1 32 32 
— — gp +y )/2 
5° (5.3) 


To determine the joint density of R? and @—call it f(d, 0)—we make the change 
of variables 


d=x+4+y’, 8 = tan“! (=) 
x 
As the Jacobian of this transformation—that is, the determinant of partial deriva- 


tives of d and 6 with respect to x and y—is easily shown to equal 2, it follows 
from Equation (5.3) that the joint density function of R? and © is given by 


11 
f(d, 0) = arza 0<d<w,0<0<27 


However, as this is equal to the product of an exponential density having mean 2 
(namely, +e~*/”) and the uniform density on (0, 277) [namely, (27r)~"], it follows 
that 

R? and © are independent, with R? being exponential with mean 2 


and © being uniformly distributed over (0, 27r) (5.4) 
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We can now generate a pair of independent standard normal random variables 
X and Y by using (5.4) to first generate their polar coordinates and then transform 
back to rectangular coordinates. This is accomplished as follows: 


STEP 1: Generate random numbers U, and U. 

STEP 2: R? = —2logU, (and thus R° is exponential with mean 2). 
© = 27 U, (and thus ® is uniform between 0 and 277). 

STEP 3: Now let 


X = Rcos © = y —2 log U, cos(271U2) 
Y = Rsin © = y —2 log U, sin(27U) 


The transformations given by Equations (5.5) are known as the Box—Muller 
transformations. 

Unfortunately, the use of the Box—Muller transformations (5.5) to generate a 
pair of independent standard normals is computationally not very efficient: The 
reason for this is the need to compute the sine and cosine trigonometric functions. 
There is, however, a way to get around this time-consuming difficulty by an 
indirect computation of the sine and cosine of a random angle (as opposed to a 
direct computation which generates U and then computes the sine and cosine of 
2a). To begin, note that if U is uniform on (0, 1) then 2U is uniform on (0, 2) 
and so 2U — 1 is uniform on (—1, 1). Thus, if we generate random numbers U, 
and U, and set 


(5.5) 


VY, =2U,-1 
V,=2U,-1 


then (V,, V2) is uniformly distributed in the square of area 4 centered at (0, 0)— 
see Figure 5.3. 

Suppose now that we continually generate such pairs (V,, V,) until we obtain 
one that is contained in the circle of radius 1 centered at (0, 0)—that is, until 
(Vi, V2) is such that V? + V2 < 1. It now follows that such a pair (V,, V2) is 
uniformly distributed in the circle. If we let R and © denote the polar coordinates 
of this pair, then it is not difficult to verify that R and © are independent, with 
R? being uniformly distributed on (0, 1) (see Exercise 21) and with © being 
uniformly distributed over (0,277). Since © is thus a random angle, it follows 
that we can generate the sine and cosine of a random angle © by generating a 
random point (V,, V2) in the circle and then setting 


. Vz Vy 
sin © = R = PEETI) 
(V2 + V3) 
V, V. 
cos@ = — = : 


RME" 


TANEET OEE PATRAN EA 
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Figure 5.3. (V,, V2) Uniformly Distributed in the Square. 


It now follows from the Box—Muller transformation (5.5) that we can generate 
independent standard normals by generating a random number U and setting 


V, 
X = (—2log U 
(v? +v)? 
‘ (5.6) 
2 
Y= (—2log Ue 
(Vi +2) 


In fact, since R? = V? + VŽ is itself uniformly distributed over (0, 1) and is 
independent of the random angle ©, we can use it as the random number U 


needed in Equations (5.6). Therefore, letting S = R?, we obtain that 


_ 1/2 
x= (—2Iog s)2 2 V, ( z) 


51/2 S 

V. —2log S s 
— (—I loo ġ1/2 2 — 
Y = (—2log 5) ts = %a( : ) 


are independent standard normals when (V,, V2) is a randomly chosen point in 
the circle of radius 1 centered at the origin, and S = v? + v2. 

Summing up, we thus have the following approach to generating a pair of 
independent standard normals: 


STEP 1: Generate random numbers, U, and U}. 

STEP 2: Set V, =2U,—1, V, = 2U, — 1, S = V? + V2. 
STEP 3: If S> 1 return to Step 1. 

STEP 4: Return the independent standard normals 
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—2logS l —2log S 
X=), Y= 


The above is called the polar method. Since the probability that a random 
point in the square will fall within the circle is equal to 77/4 (the area of the 
circle divided by the area of the square), it follows that, on average, the polar 
method will require 4/7 = 1.273 iterations of Step 1. Hence it will, on average, 
require 2.546 random numbers, 1 logarithm, 1 square root, 1 division, and 4.546 
multiplications to generate two independent unit normals. 


5.4 Generating a Poisson Process 


Suppose we wanted to generate the first n event times of a Poisson process 
with rate A. To do so we make use of the result that the times between suc- 
cessive events for such a process are independent exponential random variables 
each with rate A. Thus, one way to generate the process is to generate these 
interarrival times. So if we generate n random numbers U,, U,,..., U, and set 
Xx; = —flog U,, then X; can be regarded as the time between the (i— 1)st and 
the ith event of the Poisson process. Since the actual time of the jth event will 
equal the sum of the first j interarrival times, it thus follows that the generated 
values of the first n event times are $; Xp j=1,...,n. 

If we wanted to generate the first T time units of the Poisson process, we 
can follow the preceding procedure of successively generating the interarrival 
times, stopping when their sum exceeds T. That is, the following algorithm can 
be used to generate all the event times occurring in (0, T) of a Poisson process 
having rate A. In the algorithm ¢ refers to time, J is the number of events that 
have occurred by time t, and S(J) is the most recent event time. 


Generating the First T Time Units of a Poisson Process 
with Rate A 


step 1: f¢=0,7=0. 

STEP 2: Generate a random number U. 
STEP 3: ¢=t— {logJU. If t> T, stop. 
STEP 4: J=J+1,S()=t. 

STEP 5: Go to Step 2. 


The final value of 7 in the preceding algorithm will represent the number of 
events that occur by time T, and the values S(1),...,S(J) will be the J event 
times in increasing order. 

Another way to simulate the first T time units of a Poisson process with rate 
A starts by simulating N(T), the total number of events that occur by time T. 
Because N(T) is Poisson with mean AT, this is easily accomplished by one of 


5.5 Generating a Nonhomogeneous Poisson Process 83 


the approaches given in Chapter 4. If the simulated value of N(T) is n, then n 
random numbers U,,...,U,, are generated, and {7U,,...,7TU,,} are taken as 
the set of event times by time T of the Poisson process. 

To verify that the preceding method works, let N(t) equal the number of 
values in the set {TU,,..., TUyn} that are less than t. We must now argue 
that (2), 0 < t < T, is a Poisson process. To show that it has independent and 
stationary increments, let J,,...,Z, be r disjoint time intervals in the interval 
[0, T]. Say that the i" Poisson event is a type i event if TU; lies in the i” of these 
r disjoint time intervals, i=1,...,r, and say it is type r+ 1 if it does not lie in 
any of the r intervals. Because the U,, i > 1, are independent, it follows that each 
of the Poisson number of events N(T) is independently classified as being of one 


of the types 1,...,7-+1, with respective probabilities p,,...,p,,,, where p; 
is the length of the interval J, divided by T when i < r, and p,,, = 1— Pini Pi- 
It now follows, from the results of Section 2.8, that N,,..., N,, the numbers of 


events in the disjoint intervals, are independent Poisson random variables, with 
E[N;] equal to A multiplied by the length of the interval J,; which establishes 
that N(t),0 < t < T, has stationary as well as independent increments. Because 
the number of events in any interval of length h is Poisson distributed with mean 
Ah, we have 


= À —Ah 
fim PEM = m A 
h—0 h h0 h 


and 


kim P{N(h) = 2} = 1—e7** — Ahew™* 
h0 h h0 h 


which completes the verification. 
If all we wanted was to simulate the set of event times of the Poisson pro- 


‘cess, then the preceding approach would be more efficient than simulating 


the exponentially distributed interarrival times. However, we usually desire the 
event times in increasing order; thus, we would also need to order the values 
TU;,i=1,...,7. 


5.5 Generating a Nonhomogeneous Poisson Process 


An extremely important counting process for modeling purposes is the nonho- 
mogeneous Poisson process, which relaxes the Poisson process assumption of 
stationary increments. Thus, it allows for the possibility that the arrival rate need 
not be constant but can vary with time. It is usually very difficult to obtain 
analytical results for a mathematical model that assumes a nonhomogeneous 
Poisson arrival process, and as a result such processes are not applied as often 


| 
j 
‘ 
i 
i 
i 


84 5 Generating Continuous Random Variables 


as they should be. However, because simulation can be used to analyze such 
models, we expect that such mathematical models will become more common. 

Suppose that we wanted to simulate the first T time units of a nonhomogeneous 
Poisson process with intensity function A(t). The first method we present, called 
the thinning or random sampling approach, starts by choosing a value A which 
is such that 


A(t) <A forallt<T 


Now, as shown in Chapter 2, such a nonhomogeneous Poisson process can 
be generated by a random selection of the event times of a Poisson process 
having rate A. That is, if an event of a Poisson process with rate À that occurs 
at time ¢ is counted (independently of what has transpired previously) with 
probability A(¢)/A, then the process of counted events is a nonhomogeneous 
Poisson process with intensity function A(t),0 < t < T. Hence, by simulating 
a Poisson process and then randomly counting its events, we can generate the 
desired nonhomogeneous Poisson process. This can be written algorithmically 
as follows. 


Generating the First T Time Units of a Nonhomogeneous 
Poisson Process 


STEP 1: f=0,/=0. 

STEP 2: Generate a random number U. 

STEP 3: t=t—+logJU. If t> T, stop. 

STEP 4: Generate a random number U. 

step 5: If U <A(t)/À, set I =1I+1, S() =t. 
STEP 6: Go to Step 2. 


In the above A(t) is the intensity function and À is such that A(t) < A. The final 
value of J represents the number of events time T, and S(1),..., S(J) are the 
event times. 

The above procedure, referred to as the thinning algorithm (because it 
“thins” the homogeneous Poisson points), is clearly most efficient, in the sense of 
having the fewest number of rejected events times, when A(?) is near A through- 
out the interval. Thus, an obvious improvement is to break up the interval into 
subintervals and then use the procedure over each subinterval. That is, deter- 
mine appropriate values k,0 = tọ < f < bh <-t < tk < tiy = T, Ajs -o e o Any 
such that 


A(s) <A; iftj<s<t, i=1,...,k+1 (5.7) 


Now generate the nonhomogeneous Poisson process over the interval (t;_,, t;) by 
generating exponential random variables with rate À;, and accepting the generated 
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event occurring at time s,s € (t;_,,t;), with probability A(s)/A;. Because of 
the memoryless property of the exponential and the fact that the rate of an 
exponential can be changed upon multiplication by a constant, it follows that 
there is no loss of efficiency in going from one subinterval to the next. That is, 
if we are at t € (t;_,,¢;) and generate X, an exponential with rate A;, which is 
such that t+ X > t;, then we can use A;[X — (t;—t)]/A,,, as the next exponential 
with rate A,,). 

We thus have the following algorithm for generating the first T time units 
of a nonhomogeneous Poisson process with intensity function A(s) when the 
relations (5.7) are satisfied. In the algorithm ¢ represents the present time, J the 
present interval (i.e., J = j when ¢;_; <t < t;), J the number of events so far, 
and S(1),..., S(Z) the event times. 


Generating the First T Time Units of a Nonhomogeneous 
Poisson Process 


step 1: ¢=0,J/=1,/=0. 

STEP 2: Generate a random number U and set X = = log U. 
STEP 3: Ift+xX>t,, go to Step 8. 

STEP 4: t=t+X. 

STEP 5: Generate a random number U. 

STEP 6: If U<A(t)/A,, set I =1+1, S() =t. 

STEP 7: Go to Step 2. 

STEP 8: If J/=k-+1, stop. 

STEP 9: X=(X—t,4+-NA,/Ajyt=t,J=I+1. 

sTEP 10: Go to Step 3. 


Suppose now that over some subinterval (t;_,, t) we have that A; > 0, where 
A; = Infimum{A(s): ti < S< ti} 


In such a situation we should not use the thinning algorithm directly but rather 
should first simulate a Poisson process with rate À; over the desired interval and 
then simulate a nonhomogeneous Poisson process with the intensity function 
A(s) = A(s) — A; when s € (¢;_,, t). (The final exponential generated for the 
Poisson process, which carries one beyond the desired boundary, need not be 
wasted but can be suitably transformed so as to be reusable.) The superposition 
(or merging) of the two processes yields the desired process over the interval. 
The reason for doing it this way is that it saves the need to generate uniform 
random variables for a Poisson distributed number, with mean A,(t;—t;_,), of 
the event times. For example, consider the case where 


A(s)=10+s, O<s<l 
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Using the thinning method with A = 11 would generate an expected number of 
11 events, each of which would require a random number to determine whether 
or not it should be accepted. On the other hand, to generate a Poisson process 
with rate 10 and then merge it with a nonhomogeneous Poisson process with 
rate A(s) = s,0 < s < 1 (generated by the thinning algorithm with A = 1), would 
yield an equally distributed number of event times but with the expected number 
needing to be checked to determine acceptance being equal to 1. 

A second method for simulating a nonhomogeneous Poisson process having 
intensity function A(t), t > 0, is to directly generate the successive event times. 
So let S,,5,,... denote the successive event times of such a process. As these 
random variables are clearly dependent, we generate them in sequence—starting 
with S}, and then using the generated value of S, to generate S,, and so on. 

To start, note that if an event occurs at time s, then, independent of what has 
occurred prior to s, the additional time until the next event has the distribution 
F,, given by 


F,(x) = P{time from s until next event is less than x|event at s} 
= P{next event is before x+ s|event at s} 
= P{event between s and s+ x|event at s} 
= P{event between s and s-+ x} by independent increments 


= 1 — P{0 events in (s, s+ x)} 


= 1] — exp (- [o ay) 
Dies (- [ E »)ay) (5.8) 


We can now simulate the event times 5,,5,,... by generating S, from the 
distribution Fy; if the simulated value of S, is sı, we generate S, by adding 
sı to a generated value from the distribution F, ; if this sum is s, we generate 
S, by adding s, to a generated value from the distribution F,; and so on. The 
method used to simulate from these distributions should of course depend on 
their form. In the following example the distributions F, are easily inverted and 


so the inverse transform method can be applied. 


Example 5h Suppose that A(t) = 1/(t+ a), t > 0, for some positive con- 
stant a. Then 


d x» 1 x+s+a 
A dy = ———— dy =| ——— 
i CId Í, s+ty+a > og ( s+a ) 
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Hence, from Equation (5.8), 
F(x) =1- MA Ge Bete 
x+sta x+s-+a 
To invert this, suppose that x = F7! (u), and so 
l x 
u = E = —— 
s) x+s+a 
or, equivalently, 
u(s+a) 
: 
1—u 
That is, 
Sý u 
F>\(u) = (s+a) 
l—u 
We can therefore generate the successive event times S}, Sz, ... by generating 


random numbers U,, U,,... and then recursively setting 


au, 
{= 
1 1-0, 
U. S, tau. 
5S, =S,+(S ens lane 
2 Ae ROA) ee 1-0, 


and, in general, 


U; _ Sjatauj j 
1-U, 1—-U, , a 


S; = Sji t (Sji +4) 


Exercises 

1. Give a method for generating a random variable having density function 
f(x)=e/(e-1), O<xK<l 

2. Give a method to generate a random variable having density function 


a if 2<x<3 


fo)=| 2 if 3<x<6 


2 
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3. Use the inverse transform method to generate a random variable having 
distribution function 


4. Give a method for generating a random variable having distribution function 
F(x) = 1—exp(—ax®), O<x<0oo0 


A random variable having such a distribution is said to be a Weibull random 
variable. 


5. Give a method for generating a random variable having density function 


—o<x<0 
0<x<œ 


fe) = e 


6. Let X be an exponential random variable with mean 1. Give an efficient 
algorithm for simulating a random variable whose distribution is the conditional 
distribution of X given that X < 0.05. That is, its density function is 


-x 


e 
fa) = yas 0<x<0.05 


Generate 1000 such variables and use them to estimate of E[X | X < 0.05]. Then 
determine the exact value of E[X | X < 0.05]. 


7. (The Composition Method) Suppose it is relatively easy to generate random 
variables from any of the distributions F;, i = 1, . . . , n. How could we generate 
a random variable having the distribution function 


Fx) = pF) 


i=] 
where p;,i=1,...,”, are nonnegative numbers whose sum is 1? 
8. Using the result of Exercise 7, give algorithms for generating random vari- 
ables from the following distributions. 


(a) F(x) = #=#",0<x<1 
ne 42x if Q<x<1 


(b) Fx) = {se 
3 


(c) F(x) = 0, @',0<x<1, wherea,;>0, Da5 1 


if 1 < x< œ 
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9. Give a method to generate a random variable having distribution function 
F(x) =f Xe? dy, O<x<l 
0 


[Hint: Think in terms of the composition method of Exercise 7. In particular, 
let F denote the distribution function of X, and suppose that the conditional 


` distribution of X given that Y = y is 


P{X<x|Y=y}=x, 0<x<1] 


10. A casualty insurance company has 1000 policyholders, each of whom will 
independently present a claim in the next month with probability .05. Assuming 
that the amounts of the claims made are independent exponential random vari- 
ables with mean $800, use simulation to estimate the probability that the sum of 
these claims exceeds $50,000. 


11. Write an algorithm that can be used to generate exponential random vari- 
ables in sets of 3. Compare the computational requirements of this method with 
the one presented after Example 5c which generates them in pairs. 


12. Suppose it is easy to generate random variable from any of the distribution 
F,,i=1,...,n. How can we generate from the following distributions? 


(a) F) = Tl FQ) 
(b) F@) =1-TTal-F@] 


{Hint: If X,,i=1,...,n, are independent random variables, with X; having 
distribution F,, what random variable has distribution function F?] 


13. Using the rejection method and the results of Exercise 12, give two other 


methods, aside from the inverse transform method, that can be used to generate 


a random variable having distribution function 
F(x)=x", 0<x<1 
Discuss the efficiency of the three approaches to generating from F. 
14. Let G be a distribution function with density g and suppose, for constants 


a < b, we want to generate a random variable from the distribution function 


G(x) — G(a) 


P= Gby=G@ = 


(a) If X has distribution G, then F is the conditional distribution of X 
given what information? 
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(b) Show that the rejection method reduces in this case to generating a 
random variable X having distribution G and then accepting it if it lies 
between a and b. - 


15. Give two methods for generating a random variable having density function 
f(x) =xe™*, O<x<00 


and compare their efficiency. 


16. Suppose that we want to generate a random variable X whose density 
function is 


1 
fa~= ae x>0 


by using the rejection method with an exponential density having rate A. Find 
the value of A that minimizes the expected number of iterations of the algorithm 
used to generate X. 


17. Give an algorithm that generates a random variable having density 
f(x) = 3007-247 +x*), O<x<1 


Discuss the efficiency of this approach. 


18. Give an efficient method to generate a random variable X having density 


1 
= ——x(1-x)*, . 1 
I= e eS 
19. In Example 5f we simulated a normal random variable by using the rejection 
technique with an exponential distribution with rate 1. Show that among all 
exponential density functions g(x) = Ae~** the number of iterations needed is 
minimized when A = 1. 


20. Write a program that generates normal random variables by the method of 
Example 5f. 


21. Let (X, Y) be uniformly distributed in a circle of radius 1. Show that if R 
is the distance from the center of the circle to (X, Y) then R? is uniform on (0, 1). 


22. Write a program that generates the first T time units of a Poisson process 
having rate A. 


23. To complete a job a worker must go through k stages in sequence. The time 
to complete stage i is an exponential random variable with rate A, i=l,...,k. 
However, after completing stage i the worker will only go to the next stage with 


La eR ARNA 


SOA 


rumaan 
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probability a,, i=1,...,k—1. That is, after completing stage i the worker will 
stop working with probability 1 — aæ;. If we let X denote the amount of time that 
the worker spends on the job, then X is called a Coxian random variable. Write 
an algorithm for generating such a random variable. 


24. Buses arrive at a sporting event according to a Poisson process with rate 5 


per hour. Each bus is equally likely to contain either 20,21,...,40 fans, with 


the numbers in the different buses being independent. Write an algorithm to 
simulate the arrival of fans to the event by time t = 1. 


25. (a) Write a program that uses the thinning algorithm to generate the first 
10 time units of a nonhomogeneous Poisson process with intensity 
function 


4 
A(t) = 34+ — 
@) ieee 


(b) Give a way to improve upon the thinning algorithm for this example. 


26. Give an efficient algorithm to generate the first 10 times units of a nonho- 
mogeneous Poisson process having intensity function 


i 0<t<5 
A(t) = 3? 
2) a 5<r<10 
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The Discrete Event 
Simulation Approach 


Introduction 


Simulating a probabilistic model involves generating the stochastic mechanisms 
of the model and then observing the resultant flow of the model over time. 
Depending on the reasons for the simulation, there will be certain quantities of 
interest that we will want to determine. However, because the model’s evolution 
over time often involves a complex logical structure of its elements, it is not 
always apparent how to keep track of this evolution so as to determine these 
quantities of interest. A general framework, built around the idea of “discrete 
events,” has been developed to help one follow a model over time and determine 
the relevant quantities of interest. The approach to simulation based on this 
framework is often referred to as the discrete event simulation approach. 


` 6.1 Simulation via Discrete Events 


The key elements in a discrete event simulation are variables and events. To do 
the simulation we continually keep track of certain variables. In general, there 
are three types of variables that are often utilized—the time variable, and the 
system state variable. - 


Variables 


1. Time variable t This refers to the amount of (simulated) time that 
has elapsed 
2. Counter variables These variables keep a count of the number of times 
that certain events have occurred by time t 
3. System state (SS) This describes the “state of the system” at the time t 
variable 
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Whenever an “event” occurs the values of the above variables are changed, 
or updated, and we collect, as output, any relevant data of interest. In order to 
determine when the next event will occur, an “event list,” which lists the nearest 
future events and when they are scheduled to occur, is maintained. Whenever 
an event “occurs” we then reset the time and all state and counter variables and 
collect the relevant data. In this way we are able to “follow” the system as it 
evolves over time. 

As the preceding is only meant to give a very rough idea of the elements of a 
discrete event simulation, it is useful to look at some examples. In Section 6.2 we 
consider the simulation of a single-server waiting line, or queueing, system. In 
Sections 6.3 and 6.4 we consider multiple-server queueing systems. The model 
of Section 6.3 supposes that the servers are arranged in a series fashion, and 
the one of 6.4 that they are arranged in a parallel fashion. In Section 6.5 we 
consider an inventory stocking model, in 6.6 an insurance risk model, and in 6.7 
a multimachine repair problem. In Section 6.8 we consider a model concerning 
stock options. 

In all the queueing models, we suppose that the customers arrive in accordance 
with a nonhomogeneous Poisson process with a bounded intensity function 
A(t), t > 0. In simulating these models we will make use of the following 
subroutine to generate the value of a random variable T,, defined to equal the 
time of the first arrival after time s. 

Let A be such that A(‘) < A for all t. Assuming that àÀ(ż), t > 0, and A are 
specified, the following subroutine generates the value of T,. 


A Subroutine for Generaiing T, 


STEP 1: Lett=s. 

STEP 2: Generate U. 

STEP 3: Let t=1—+logU. 

STEP 4: Generate U. 

STEP 5: If U <A(t)/A, set T, = t and stop. 
STEP 6: Go to Step 2. 


6.2 A Single-Server Queueing System 


Consider a service station in which customers arrive in accordance with a nonho- 
mogeneous Poisson process with intensity function A(r), t > 0. There is a single 
server, and upon arrival a customer either enters service if this server is free at 
that moment or else joins the waiting queue if the server is busy. When the server 
completes serving a customer, it then either begins serving the customer that had 
been waiting the longest (the so-called “first come first served” discipline) if 
there are any waiting customers, or, if there are no waiting customers, it remains 
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free until the next customer’s arrival. The amount of time it takes to service a 
customer is a random variable (independent of all other service times and of 
the arrival process), having probability distribution G. In addition, there is a 
fixed time T after which no additional arrivals are allowed to enter the system, 
although the server completes servicing all those that are already in the system 
at time T. 

Suppose that we are interested in simulating the above system to determine 
such quantities as (a) the average time a customer spends in the system and (b) 
the average time past T that the last customer departs—that is, the average time 
at which the server can go home. 

To do a simulation of the preceding system we use the following variables: 


Time Variable t 
Counter Variables N,: the number of arrivals (by time t) 
Np: the number of departures (by time t) 
System State Variable n: the number of customers in the system 
(at time f) 


Since the natural time to change the above quantities is when there is either an 
arrival or a departure, we take these as the “events.” That is, there are two types 
of event: arrivals and departures. The event list contains the time of the next 
arrival and the time of the departure of the customer presently in service. That 
is, the event list is 


EL = t4, tp 


where t, is the time of the next arrival (after t) and tp is the service completion 
time of the customer presently being served. If there is no customer presently 
being served, then tp, is set equal to oo. 

The output variables that will be collected are A(i), the arrival time of customer 
i; D(i), the departure time of customer i; and T,, the time past T that the last 
customer departs. 

To begin the simulation, we initialize the variables and the event times as 
follows: 


Initialize 


Set t = N; = Np = 0. 
Set SS = 0. 
Generate T), and set t4 = Tp, tp = ©. 


To update the system, we move along the time axis until we encounter the 
next event. To see how this is accomplished, we must consider different cases, 
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depending on which member of the event list is smaller. In the following, Y 


refers to a service time random variable having distribution G. 


t = time variable, SS =n, EL = t,, tp 


Case 1: ¢, <tp ta <T 


Reset: t = t4 (we move along to time t4). 

Reset: N, = N,-+1 (since there is an additional arrival at time ¢,). 

Reset: n = n + 1 (because there is now one more customer). 

Generate T,, and reset t, = T, (this is the time of the next arrival). 

If n = 1, generate Y and reset tp = t+ Y (because the system had been empty 
and so we need to generate the service time of the new customer). 

Collect output data A(N,) = t (because customer N, arrived at time ż). 


Case 2: tp <t,,tp <T 


Reset: t = ftp. 

Reset: n=n—1. 

Reset: Np = Np + 1 (since a departure occurred at time 1). 

If n = 0, reset tp = œ; otherwise, generate Y and reset tp =t +Y. 
Collect the output data D(Np) = t (since customer Np just departed). 


Case 3: min(t,,tp) > T,n>0 


Reset: t = tp 

Reset: n=n—1 

Reset: Np = No +l 

If n > 0, generate Y and reset tp = t+ Y. 
Collect the output data D(Np) = t. 


Case 4: min(t,, tp) >T,n=0 
Collect output data T, = max(t — T, 0). 


The preceding is illustrated in the flow diagram presented in Figure 6.1. Each time 
we arrive at the “stop” box we would have collected the data N,, the total number 
of arrivals, which will equal Np, the total number of departures. For each i, i = 
1,...,N,, we have A(i) and D(i), the respective arrival and departure times of 
customer i [and thus D(i) — A(i) represents the amount of time that customer i spent 
in the system]. Finally, we will have T,, the time past T at which the last customer 
departed. Each time we collect the above data we say that a simulation run has been 
completed. After each run we then reinitialize and generate another run until it has 
been decided that enough data have been collected. (In Chapter 7 we consider the 
question of when to end the simulation.) The average of all the values of T, that have 
been generated will be our estimate of the mean time past T that the last customer 
departs; similarly, the average of all the observed values of D — A (i.e., the average 
time, over all customers observed in all our simulation runs, that a customer spends 
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Figure 6.1. Simulating the Single Server Queue. 


in the system) will be our estimate of the average time that a customer spends in the 
system. 


Remark If we want to save output data giving the number of customers in 
the system at each point of time, all that is necessary is to output the system 
state and time variable pair (n, £) whenever an event occurs. For instance, if the 


. data (1, 4) and (0, 6) were output then, with n(t) being the number in the system 


at time t, we would know that 


n(t)=0, if0<t<4 
n(t)=1, if4<t<6 
n(t)=0, ift=6 o 


6.3 A Queueing System with Two Servers in Series 


Consider a two-server system in which customers arrive in accordance with a 
nonhomogeneous Poisson process, and suppose that each arrival must first be 
served by server 1 and upon completion of service at 1 the customer goes over to 
server 2. Such a system is called a tandem or sequential queueing system. Upon 
arrival the customer will either enter service with server 1 if that server is free, 
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Figure 6.2. A Tandem Queue. 


` or join the queue of server 1 otherwise. Similarly, when the customer completes 
service at server 1 it then either enters service with server 2 if that server is free, 
or else it joins its queue. After being served at server 2 the customer departs 
the system. The service times at server i have distribution G;, i = 1,2. (See 
Figure 6.2.) 

Suppose that we are interested in using simulation to study the distribution of 
the amounts of time that a customer spends both at server 1 and at server 2. To 
do so, we will use the following variables. 


Time Variable ¢ 

System State (SS) Variable 
(n,, n2): if there are n, customers at server 1 (including both those in queue 

and in service) and n, at server 2 

Counter Variables 
N,: the number of arrivals by time ¢ 
Np: the number of departures by time ¢ 

Output Variables 
A(n): the arrival time of customer n,n > 1 
A,(n): the arrival time of customer n at server 2, n > 1 
D(n): the departure time of customer n,n > 1 

Event List t,,t,,t,, where t, is the time of the next arrival, and ¢; is the 
service completion time of the customer presently being served by server 
i, i = 1,2. If there is no customer presently with server i, then f; = oo, 
i = 1,2. The event list always consists of the three variables t4, t), t2- 


To begin the simulation, we initialize the variables and the event list as 
follows: 


Initialize 


Set t= Na = Np = 0. 
Set SS = (0,0). _ 
Generate To, and set t, = To, t; = t, = œ. 
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To update the system, we move along in time until we encounter the next event. 
We must consider different cases, depending on which member of the event list 
is smallest. In the following, Y, refers to a random variable having distribution 
G;,i=1,2. 


SS = (n,, m) EL = t4 t,t, 


Case 1: ¢, = min(t,, fi, t2) 


Reset: t = t4. 

Reset: Na = N,+1. 

Reset: n =n, +1. 

Generate T,, and reset t, = T,. 

If n, = 1, generate Y, and reset t, = t + Yj. 
Collect output data A; (N4) =t. 


Case 2: 4 <th <t, 


Reset: t = ty. 

Reset: n; =n; —1, n =m +1. 

If n, = 0, reset t, = 00; otherwise, generate Y, and reset 4 = t+ Y}. 
If n, = 1, generate Y, and reset t, = t + Y. 

Collect the output data A (N; —n;) =t. 


Case 3: h<t, b <t 


Reset: t = h. 

Reset: Np = Np +1. 

Reset: n, = m — 1. 

If n, = 0, reset t, = œ. 

If n, > 0, generate Y,, and reset f, = t + Y. 
Collect the output data D(Np) = t. 


Using the preceding updating scheme it is now an easy matter to simulate the 
system and collect the relevant data. 


6.4 A Queueing System with Two Parallel Servers 


Consider a model in which customers arrive at a system having two servers. 
Upon arrival the customer will join the queue if both servers are busy, enter 
service with server 1 if that server is free, or enter service with server 2 otherwise. 
When the customer completes service with a server (no matter which one), 
that customer then departs the system and the customer that has been in queue 
the longest (if there are any customers in queue) enters service. The service 
distribution at server i is G;, i= 1, 2. (See Figure 6.3.) 
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Figure 6.3. A Queue with Two Parallel Servers. 


Suppose that we want to simulate the preceding model, keeping track of the 
amounts of time spent in the system by each customer, and the number of services 
performed by each server. Because there are multiple servers, it follows that 
customers will not necessarily depart in the order in which they arrive. Hence, 
to know which customer is departing the system upon a service completion we 
will have to keep track of which customers are in the system. So let us number 
the customers as they arrive, with the first arrival being customer number 1, the 
next being number 2, and so on. Because customers enter service in order of 
their arrival, it follows that knowing which customers are being served and how 
many are waiting in queue enables us to identify the waiting customers. Suppose 
that customers i and j are being served, where i < j, and that there are n—2 > 0 
others waiting in queue. Because all customers with numbers less than j would 
have entered service before j, whereas no customer whose number is higher than 
j could yet have completed service (because to do so they would have had to 
enter service before either i or j), it follows that customers j+1,...,j-+n—2 
are waiting in queue. 

To analyze the system we will use the following variables: 


Time Variable t 

System State Variable (SS) 

(n, i}, i2) if there are n customers in the system, i, is with server 1 and 
i, is with server 2. Note that SS = (0) when the system is empty, and 
SS = (1, j,0) or (1,0, j) when the only customer is j and he is being 
served by server 1 or server 2, respectively. 

Counter Variables 
N,: the number of arrivals by time ¢ 
C;: the number of customers served by j, j = 1,2, by time t 

Output Variables 
A(n): the arrival time of customer n,n > 1 
D(n): the departure time of customer n, n > 1 

Event list f,, t,, f2 

where ft, is the time of the next arrival, and t; is the service completion time 
of the customer presently being served by server i,i = 1,2. If there is 
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no customer presently with server i, then we set t; = œ, i= 1,2. In the 
following, the event list will always consist of the three variables t4, f4, ty. 


To begin the simulation, we initialize the variables and event list as follows: 


Initialize 


Set t=N,=C, =C,=0. 
Set SS = (0). 
Generate Tp, and set tg = To, f =f, = ©. 


To update the system, we move along in time until we encounter the next event. 
In the following cases, Y, always refers to a random variable having distribution 
G;, i= 1, 2. 
Case 1: SS= (n, i, i,) and t, = min(t,, t,, t2) 

Reset: t = t4- 

Reset: N, = Na +1. 

Generate T, and reset t, = T,- 

Collect the output data A(N,) = t. 
If SS = (0): 

Reset: SS = (1, N4, 0). 

Generate Y, and reset t; = t+ Y. 
If SS = (1, j, 0): 


Reset: SS = (2, j, N,). 
Generate Y, and reset t, = t + Y}. 


If SS = (1,0, j): 


Reset SS = (2, N4, j). 
Generate Y, and reset t; = t+ Y. 


EFn>li: 
Reset: SS = (n +1, ij, ig). 
Case 2: SS = (n, i, i) and t) < t4 fi <h 


Reset: t = t. 

Reset: C, = C, +1. 

Collect the output data D(i,) = t. 
Ifn=1: 


Reset: SS = (0). 
Reset: ft; = 00. 
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Ifn=2: 


Reset: SS = (1, 0, i). 
Reset: t = co. 


If n > 2: Let m = max(i,, i) and 


Reset SS = (n—1, m+1, i) 
Generate Y, and reset t = t+ Y, 


Case 3: SS = (n, i, i) and fh < t4, b < t 
- The updatings in Case 3 are left as an exercise. 


If we simulate the system according to the preceding, stopping the simulation 
at some predetermined termination point, then by using the output variables as 
well as the final values of the counting variables C, and C,, we obtain data on 
the arrival and departure times of the various customers as well as on the number 
of services performed by each server. 


6.5 An Inventory Model 


Consider a shop that stocks a particular type of product that it sells for a price 


of r per unit. Customers demanding this product appear in accordance with a 


Poisson process with rate A, and the amount demanded by each one is a random 
variable having distribution G. In order to meet demands, the shopkeeper must 
keep an amount of the product on hand, and whenever the on-hand inventory 
becomes low, additional units are ordered from the distributor. The shopkeeper 
uses a so-called (s, S) ordering policy; namely, whenever the on-hand inventory 
is less than s and there is no presently outstanding order, then an amount is 
ordered to bring it up to S, where s < S. That is, if the present inventory level 
is x and no order is outstanding, then if x < s the amount S—x is ordered. The 
cost of ordering y units of the product is a specified function c(y), and it takes 
L units of time until the order is delivered, with the payment being made upon 
delivery. In addition, the shop pays an inventory holding cost of h per unit item 
per unit time. Suppose further that whenever a customer demands more of the 
product than is presently available, then the amount on hand is sold and the 
remainder of the order is lost to the shop. 

Let us see how we can use simulation to estimate the shop’s expected profit 
up to some fixed time T. To do so, we start by defining the variables and events 
as follows. 


Time Variable t 
System State Variable (x, y) 
where x is the amount of inventory on hand, and y is the amount on order. 
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Counter Variables 
C, the total amount of ordering costs by £ 
H, the total amount of inventory holding costs by t 
R, the total amount of revenue earned by time t 
Events will consist of either a customer or an order arriving. The event times are 
to, the arrival time of the next customer 
tı» the time at which the order being filled will be delivered. If there is no 
outstanding order then we take the value of ft, to be œ. 


The updating is accomplished by considering which of the event times is smaller. 
If we are presently at time ¢ and we have the values of the preceding variables, 
then we move along in time as follows. 


Case 1: 4 <1, 


Reset: H = H + (tọ —t)xh since between times ¢ and tọ we incur a holding 
cost of (tọ — t)h for each of the x units in inventory. 

Reset: t = fp. 

Generate D, a random variable having distribution G.D is the demand of the 
customer that arrived at time fp. 

Let w = min(D, x) be the amount of the order that can be filled. The inventory 
after filling this order is x — w. 

Reset: R= R+ wr. 

Reset: x = x— w. 

If x < s and y = 0 then reset y = S—x,t, =t+L. 

Generate U and reset tọ = t— + log(U). 


Case 2: 4 <t 
Reset: H = H+ (tı —t)xh. 


Reset: t = t}. 
Reset: C = C+c(y). 
Reset: x = x+y. 


Reset: y = 0, tf; = œ. 


By using the preceding updating schedule it is easy to write a simulation program 
to analyze the model. We could then run the simulation until the first event occurs 
after some large preassigned time T, and we could then use (R — C — H)/T as 
an estimate of the shop’s average profit per unit time. Doing this for varying 
values of s and S would then enable us to determine a good inventory ordering 
policy for the shop. 


6.6 An Insurance Risk Model 


Suppose that the different policyholders of a casualty insurance company gen- 
erate claims according to independent Poisson processes with a common rate A, 
and that each claim amount has distribution F. Suppose also that new customers 
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sign up according to a Poisson process with rate v, and that each existing pol- 
icyholder remains with the company for an exponentially distributed time with 
rate u. Finally, suppose that each policyholder pays the insurance firm at a fixed 
rate c per unit time. Starting with ny customers and initial capital ag > 0, we are 
interested in using simulation to estimate the probability that the firm’s capital 
is always nonnegative at all times up to time T. 

To simulate the preceding, we define the variables and events as follows. 


Time Variable t 

System State Variable (n, a), where n is the number of policyholders and a 
is the firm’s current capital. 

Events There are three types of events: a new policyholder, a lost policy- 
holder, and a claim. The event list consists of a single value, equal to the 
time at which the next event occurs. 

EL t; 


We are able to have the event list consist solely of the time of the next event 
because of results about exponential random variables that were presented in 
Section 2.9. Specifically, if (n, a) is the system state at time t then, because the 
minimum of independent exponential random variables is also exponential, the 
time at which the next event occurs will equal t+ X, where X is an exponential 
random variable with rate y-+n-+nA. Moreover, no matter when this next 
event occurs, it will result from 

v 
v+nu+nàÀ 

np 
y+np+na 


A new policyholder, with probability 


A lost policyholder, with probability 


nd 
v+netnr 


After determining when the next event occurs, we generate a random number 
to determine which of the three possibilities caused the event, and then use this 
information to determine the new value of the system state variable. 

In the following, for given state variable (n,a), X will be an exponential 
random variable va rate v+nu+nà; J will be A random variable equal to 1 
pa Onon 5 ip sizam. tO 2 with probability ; PET im. oF to 3 with probability 
Y will be a random variable having the claim distribution F. 


A claim, with probability 


EEEN : 


Output Variable 7, where 


I= 1, if the firm’s capital is nonnegative throughout [0, £] 
~ 10, otherwise 


To simulate the system, we initialize the variables as follows. 
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Initialize 


First initialize 


then generate X and initialize 
t E = X 


To update the system we move along to the next event, first checking whether 
it takes us past time T. 


Update Step 


Case 1: 4 >T: 

Set J = 1 and end this run. 
Case 2: 1, <T: 

Reset 


a=a-+nec(t,—t) 
t=t, 


Generate J: 


J=1:resetn=n+l 

J=2: reset n=n—1 

J =3: Generate Y. If Y > a, set J = 0 and end this run; otherwise reset 
a=a—Y 


Generate X: reset tg = t +X 


The update step is then continually repeated until a run is completed. 


6.7 A Repair Problem 


A system needs n working machines to be operational. To guard against 
machine breakdown, additional machines are kept available as spares. Whenever 
a machine breaks down it is immediately replaced by a spare and is itself sent 
to the repair facility, which consists of a single repairperson who repairs failed 
machines one at a time. Once a failed machine has been repaired it becomes 
available as a spare to be used when the need arises (see Figure 6.4). All 
repair times are independent random variables having the common distribution 
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function G. Each time a machine is put into use the amount of time it functions 
before breaking down is a random variable, independent of the past, having 
distribution function F. ; 

The system is said to “crash” when a machine fails and no spares are available. 
Assuming that there are initially n -+ s functional machines of which n are put 
in use and s are kept as spares, we are interested in simulating this system so as 
to approximate E[T], where T is the time at which the system crashes. 

To simulate the preceding we utilize the following variables. 


Time Variable t 
System State Variable r: the number of machines that are down at time t 


Since the system state variable will change either when a working machine 
breaks down or when a repair is completed, we say that an “event” occurs 
whenever either of these occurs. In order to know when the next event will 
occur, we need to keep track of the times at which the machines presently in use 
will fail and the time at which the machine presently being repaired (if there is 
a machine in repair) will complete its repair. Because we will always need to 
determine the smallest of the n failure times, it is convenient to store these n 
times in an ordered list. Thus it is convenient to let the event list be as follows: 


Event List: 4 <i, <4, <---<t,,0 


where t,,..., t, are the times (in order) at which the n machines presently in use 
will fail, and 7* is the time at which the machine presently in repair will become 
operational, or if there is no machine presently being repaired then f* = oo. 

To begin the simulation, we initialize these quantities as follows. 


Working Spares Repair 
machines facility 


Upon repair completion 


Figure 6.4. Repair Model. 
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Initialize 


Set t = r = 0, t* = œ. 

Generate X,,...,X,, independent random variables each having distribu- 
tion F. Order these values and let t; be the ith smallest one, i=1,...,n. 

Set Event list: t, ... , tp, 2°. 


Updating of the system proceeds according to the following two cases. 


Case 1: ¢, <7 


Reset: t = tf). 

Reset: r = r+ 1 (because another machine has failed). 

If r=s-+1, stop this run and collect the data T = ¢ (since, as there are now 
s+ 1 machines down, no spares are available). 

If r<s+1, generate a random variable X having distribution F. This random 
variable will represent the working time of the spare that will now be put 
into use. Now reorder the values f,,t3,...,¢,,f-+-X and let ¢; be the ith 
smallest of these values, i=1,...,n. 

If r= 1, generate a random variable Y having distribution function G and 
reset t* = t +Y. (This is necessary because in this case the machine that 
has just failed is the only failed machine and thus repair will immediately 
begin on it; Y will be its repair time and so its repair will be completed at 
time t+ Y.) 


Case 2: ¢* <t 


Reset: t = 7. 

Reset: r=r—1. 

If r> 0, generate a random variable Y having distribution function G, and 
representing the repair time of the machine just entering service, and reset 
tf =t+Y. 

If r =0, set t* = œ. 


The above rules for updating are illustrated in Figure 6.5. 

Each time we stop (which occurs when r = s+1) we say that a run is 
completed. The output for the run is the value of the crash time T. We then 
reinitialize and simulate another run. In all, we do a total of, say, k runs with the 
successive output variables being T,,..., 7. Since these k random variables 
are independent and each represents a crash time, their average, ee T,/k, is 
the estimate of E[T], the mean crash time. The question of determining when 
to stop the simulation—that is, determining the value of k—is considered in 
Chapter 7, which presents the methods used to statistically analyze the output 
from simulation runs. 
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Initialize: t = r= 0, £ = œ 
generate X), ..., Xp ~ F 


t= ith smallest of X, ..., X, 
Even List: t, ..., ts t” 


Reset t=, r=r-1 
If r=0, set * =00 
Ifr>0, generate Y~ G and reset 


tr 
Event list: 


Generate X~F 
reorder fy, ..., ty, t +X 
and call reordering 


Figure 6.5. Simulating the Repair Model. 


6.8 Exercising a Stock Option 


Let S,,, 7 > 0 denote the price of a specified stock at the end of day n. A common 
model is to suppose that 


S, = Spexp{X,+---+X,}, nzO 


where X,, X,, ... is a sequence of independent normal random variables, each 
with mean mw and variance o?. This model, which supposes that each day’s 
percentage increase in price over the previous day has a common distribution, is 
called the lognormal random walk model. Let a = u + 07/2. Suppose now that 
you own an option to purchase one unit of this stock at a fixed price K, called 
the striking price, at the end of any of the next N days. If you exercise this 
option when the stock’s price is S then, because you only pay the amount K, 
we will call this a gain of S — K (since you could theoretically immediately turn 
around and sell the stock at the price $). The expected gain in owning the option 
(which clearly would never be exercised if the stock’s price does not exceed K 
during the time period of interest) depends on the option exercising policy you 
employ. Now, it can be shown that if œ > 0 then the optimal policy is to wait 
until the last possible moment and then exercise the option if the price exceeds 
K and not exercise otherwise. Since X, +----+Xy is a normal random variable 
with mean Nw and-variance No”, it is not difficult to explicitly compute the 


ease ere Gn CEST R RESTS 


cae 


TENE 


ONT 


6.8 Exercising a Stock Option 109 


return from this policy. However, it is not at all easy to characterize an optimal, 
or even a near optimal, policy when æ < 0, and for any reasonably good policy 
it is not possible to explicitly evaluate the expected gain. We will now give a 
policy that can be employed when a@ < 0. This policy, although far from being an 
optimal policy, appears to be reasonably good. It calls for exercising the option 
when there are m days to go whenever, for each i= 1, ... , m, that action leads 
to a higher expected payoff than letting exactly i days go by and then either 
exercising (if the price at that point is greater than K) or giving up on ever 
exercising. 

Let P,, = Sy-m denote the price of the stock when there are m days to go 
before the option expires. The policy we suggest is as follows: 


Policy: If there are m days to go, then exercise the option at this time if 
P,>K 
and, if for each i = 1,..., m 
P,,>K+P,,e®(oVi+b,) —K®(b,) 
where 


„ — iu —log(K/P,) 
E ovi 


and where ®(x) is the standard normal distribution function and can be accurately 
approximated by the following formula: For x > 0 


1 2 —x?/2 
lx) +1— aytay +ay*)e* 7 
( ) Wor 1y 2y 3y”) 
‘For x < 0, ®(x) = 1 — ®(—x); where 
= 1 
? = 1+0.33267x 


a, = 0.4361836 
a, = —0.1201676 
a, = 0.9372980 


Let SP denote the price of the stock when the option is exercised, if it is 
exercised, and let SP be K if the option is never exercised. To determine the 
expected worth of the preceding policy—that is, to determine E[SP]— K—it is 
necessary to resort to simulation. For given parameters 4, a, N, K, Sọ it is easy 
enough to simulate the price of the stock on separate days by generating X, a 
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normal random variable with mean u and standard deviation o, and then using 
the relation 
Paat = Pe 

Thus, if P,, is the price with m days to go and the policy does not call for 
exercising the option at this time, then we would generate X and determine 
the new price P,,_, and have the computer check whether the policy calls for 
exercising at this point. If so, then for that simulation run SP = P,,_,; if not, 
then we would determine the price at the end of the next day, and so on. The 
average value, over a large number of simulation runs, of SP — K would then 


be our estimate of the expected value of owning the option when you are using 
the preceding policy. 


6.9 Verification of the Simulation Model 


The end product of the discrete event approach to simulation is a computer 
program that one hopes is free of error. To verify that there are indeed no 
bugs in the program, one should, of course, use all the “standard” techniques 
of debugging computer programs. However, there are several techniques that 
are particularly applicable in debugging simulation models, and we now discuss 
some of them. 

As with all large programs one should attempt to debug in “modules” or 
subroutines. That is, one should attempt to break down the program into small 
and manageable entities that are logical wholes and then attempt to debug these 
entities. For example, in simulation models the generation of random variables 
constitutes one such module, and these modules should be checked separately. 

The simulation should always be written broadly with a large number of input 
variables. Oftentimes by choosing suitable values we can reduce the simulation 
model to one that can be evaluated analytically or that has been previously 
extensively studied, so as to compare our simulated results with known answers. 

In the testing stage, the program should be written to give as output all the 
random quantities it generates. By suitably choosing simple special cases, we 
can then compare the simulated output with the answer worked out by hand. For 
example, suppose we are simulating the first T time units of a k server queueing 
system. After inputting the values T = 8 (meant to be a small number) and k = 2, 
suppose the simulation program generates the following data: 


Customer number: 1 2 3 4 5 6 
Arrival time: 1.5 3.6 3.9 5.2 6.4 7.7 
Service time: 3.4 2.2 5.1 2.4 3.3 6.2 
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and suppose that the program gives as output that the average time spent in the 
system by these six customers is 5.12. 

However, by going through the calculations by hand, we see that the first 
customer spent 3.4 time units in the system; the second spent 2.2 (recall there are 
two servers); the third arrived at time 3.9, entered service at time 4.9 (when the 
first customer left), and spent 5.1 time units in service—thus, customer 3 spent 
a time 6.1 in the system; customer 4 arrived at time 5.2, entered service at time 
5.8 (when number 2 departed), and departed after an additional time 2.4—thus, 
customer 4 spent a time 3.0 in the system; and so on. These calculations are 
presented below: 


Arrival time: 15 3.6 3.9 5.2 6.4 77 
Time when service began: 1.5 3.6 4.9 5.8 8.2 10.0 
Departure time: 49 58 100 82 115 16.2 
Time in system: 3.4 2.2 6.1 3.0 5.1 8.5 


Hence, the output for the average time spent in the system by all arrivals up to 
time T = 8 should have been 


3.4+2.2+6.1+3.0+5.1+8.5 


= 4.71666... 
6 


thus showing that there is an error in the computer program which gave the 
output value 5.12. 

A useful technique when searching for errors in the computer program is 
to utilize a trace. In a trace, the state variable, the event list, and the counter 
variables are all printed out after each event occurs. This allows one to follow 
the simulated system over time so as to determine when it is not performing as 


-intended. (If no errors are apparent when following such a trace, one should then 


check the calculations relating to the output variables.) 


Exercises 

1. Write a program to generate the desired output for the model of Section 6.2. 
Use it to estimate the average time that a customer spends in the system and the 
average amount of overtime put in by the server, in the case where the arrival 
process is a Poisson process with rate 10, the service time density is 


g(x) = 20e~** (40x7,  x>0 


and T = 9. First try 100 runs and then 1000. 
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2. Suppose in the model of Section 6.2 that we also wanted to obtain informa- 
tion about the amount of idle time a server would experience in a day. ADLN 
how this could be accomplished. 


3. Suppose that jobs arrive at a single server queueing system according to a 
nonhomogeneous Poisson process, whose rate is initially 4 per hour, increases 
steadily until it hits 19 per hour after 5 hours, and then decreases steadily until 
it hits 4 per hour after an additional 5 hours. The rate then repeats indefinitely 
in this fashion—that is, A(t + 10) = A(z). Suppose that the service distribution 
is exponential with rate 25 per hour. Suppose also that whenever the server 
completes a service and finds no jobs waiting he goes on break for a time that 
is uniformly distributed on (0, 0.3). If upon returning from his break there are 
no jobs waiting, then he goes on another break. Use simulation to estimate the 
expected amount of time that the server is on break in the first 100 hours of 
operation. Do 500 simulation runs. 


4. Fill in the updating scheme for Case 3 in the model of Section 6.4. 


5. Consider a single-server queueing model in which customers arrive accord- 
ing to a nonhomogeneous Poisson process. Upon arriving they either enter 
service if the server is free or else they join the queue. Suppose, however, that 
each customer will only wait a random amount of time, having distribution 
F, in queue before leaving the system. Let G denote the service distribution. 
Define variables and events so as to analyze this model, and give the updating 
procedures. Suppose we are interested in estimating the average number of lost 
customers by time 7, where a customer that departs before entering service is 
considered lost. 


6. Suppose in Exercise 5 that the arrival process is a Poisson process with rate 5; F 
is the uniform distribution on (0, 5); and G is an exponential random variable with 
rate 4. Do 500 simulation runs to estimate the expected number of lost customers 
by time 100. Assume that customers are served in their order of arrival. 


7. Repeat Exercise 6, this time supposing that each time the server completes 
a service, the next customer to be served is the one who has the earliest queue 
departure time. That is, if two customers are waiting and one would depart the 
queue if his service has not yet begun by time f, and the other if her service had 
not yet begun by time ¢,, then the former would enter service if t; < t, and the 
latter otherwise. Do you think this will increase or decrease the average number 
that depart before entering service? 


8. In the model of Section 6.4, suppose that G, is the exponential distribution 
with rate 4 and G, is exponential with rate 3. Suppose that the arrivals are 
according to a Poisson process with rate 6. Write a simulation program to 
generate data corresponding to the first 1000 arrivals. Use it to estimate 


(a) the average time spent in the system by these customers. 
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(b) the proportion of services performed by server 1. 
(c) Do a second simulation of the first 1000 arrivals and use it to answer 
parts (a) and (b). Compare your answers to the ones previously obtained. 


9. Suppose in the two-server parallel model of Section 6.4 that each server has 
its own queue, and that upon arrival a customer joins the shortest one. An arrival 
finding both queues at the same size (or finding both servers empty) goes to 
server 1. 


(a) Determine appropriate variables and events to analyze this model and 
give the updating procedure. 


Using the same distributions and parameters as in Exercise 8, find 


(b) the average time spent in the system by the first 1000 customers. 
(c) the proportion of the first 1000 services performed by server 1. 


Before running your program, do you expect your answers in parts (b) and (c) 
to be larger or smaller than the corresponding answers in Exercise 8? 


10. Suppose in Exercise 9 that each arrival is sent to server 1 with probability 
p, independent of anything else. 


(a) Determine appropriate variables and events to analyze this model and 
give the updating procedure. 

(b) Using the parameters of Exercise 9, and taking p equal to your estimate 
of part (c) of that problem, simulate the system to estimate the quantities 
defined in part (b) of Exercise 9. Do you expect your answer to be larger 
or smaller than that obtained in Exercise 9? 


11. Suppose that claims are made to an insurance company according to a Pois- 
son process with rate 10 per day. The amount of a claim is a random variable 
that has an exponential distribution with mean $1000. The insurance company 
receives payments continuously in time at a constant rate of $11,000 per day. 
Starting with an initial capital of $25,000, use simulation to estimate the proba- 
bility that the firm’s capital is always positive throughout its first 365 days. 


12. Suppose in the model of Section 6.6 that, conditional on the event that the 
firm’s capital goes negative before time T, we are also interested in the time at 
which it becomes negative and the amount of the shortfall. Explain how we can 
use the given simulation methodology to obtain relevant data. 


13. For the repair model presented in Section 6.7: 
(a) Write a computer program for this model. 


(b) Use your program to estimate the mean crash time in the case where 
n=4,s=3, F(x) =1—e™, and G(x) =1—e™. 
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14. In the model of Section 6.7, suppose that the repair facility consists of two 
servers, each of whom takes a random amount of time having distribution G to 
service a failed machine. Draw a flow diagram for this system. 


15. A system experiences shocks that occur in accordance with a Poisson proc- 
ess having a rate of 1/hour. Each shock has a certain amount of damage associated 
with it. These damages are assumed to be independent random variables (which 
are also independent of the times at which the shocks occur), having the common 
density function 


f(x) =xe™*, x>0 


Damages dissipate in time at an exponential rate a—that is, a shock whose initial 
damage is x will have remaining damage value xe~™ at time s after it occurs. In 
addition, the damage values are cumulative. Thus, for example, if by time ¢ there 
have been a total of two shocks, which originated at times f, and t, and had initial 
damages x, and xz, then the total damage at time t is )-7_, x,e~*¢~). The system 
fails when the total damage exceeds some fixed constant C. 


(a) Suppose we are interested in utilizing a simulation study to estimate 
the mean time at which the system fails. Define the “events” and 
“variables” of this model and draw a flow diagram indicating how the 
simulation is to be run. 

(b) Write a program that would generate k runs. 

(c) Verify your program by comparing output with a by-hand calculation. 

(d) With a = 0.5, C = 5, and k = 1000, run your program and use the 
output to estimate the expected time until the system fails. 


16. Messages arrive at a communications facility in accordance with a Poisson 
process having a rate of 2/hour. The facility consists of three channels, and an 
arriving message will either go to a free channel if any of them are free or else will 
be lost if all channels are busy. The amount of time that a message ties up a channel 
is arandom variable that depends on the weather condition at the time the message 
arrives. Specifically, if the message arrives when the weather is “good,” then its 
processing time is a random variable having distribution function 


F(x)=x, O<x<1 


whereas if the weather is “bad” when a message arrives, then its processing time 
has distribution function 


Fo) =x, 0<x<1 


Initially, the weather is good, and it alternates between good and bad periods— 
with the good periods having fixed lengths of 2 hours and the bad periods having 
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fixed lengths of 1 hour. (Thus, for example, at time 5 the weather changes from 
good to bad.) 
Suppose we are interested in the distribution of the number of lost messages by 
time T = 100. 


(a) Define the events and variables that enable us to use the discrete event 
approach. 

(b) Write a flow diagram of the above. 

(c) Write a program for the above. 

(d) Verify your program by comparing an output with a hand calculation. 

(e) Run your program to estimate the mean number of lost messages in the 
first 100 hours of operation. 


17. Estimate, by a simulation study, the expected worth of owning an option to 
purchase a stock anytime in the next 20 days for a price of 100 if the present price 
of the stock is 100. Assume the model of Section 6.8, with w = —0.05, o = 0.3, 
and employ the strategy presented there. 
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Statistical Analysis of 
Simulated Data 


Introduction 


A simulation study is usually undertaken to determine the value of some quan- 
tity @ connected with a particular stochastic model. A simulation of the relevant 
system results in the output data X, a random variable whose expected value 
is the quantity of interest 6. A second independent simulation—that is, a sec- 
ond simulation run—provides a new and independent random variable having 
mean @. This continues until we have amassed a total of k runs—and the k inde- 
pendent random variables X,,..., X,—all of which are identically distributed 
with mean 0. The average of these k values, X = Da X;/k, is then used as an 
estimator, or approximator, of 6. 

In this chapter we consider the problem of deciding when to stop the simulation 
study—that is, deciding on the appropriate value of k. To help us decide when 
to stop, we will find it useful to consider the quality of our estimator of 0. In 


‘addition, we will also show how to obtain an interval in which we can assert 


that 0 lies, with a certain degree of confidence. 

The final section of this chapter shows how we can estimate the quality of more 
complicated estimators than the sample mean—by using an important statistical 
technique known as “bootstrap estimators.” 


7.1 The Sample Mean and Sample Variance 
Suppose that X,,..., X„ are independent random variables having the same dis- 
tribution function. Let 0 and g? denote, respectively, their mean and variance— 


that is, 9 = E[X;] and o? = Var(X;). The quantity 
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which is the arithmetic average of the n data values, is called the sample mean. 
When the population mean 0 is unknown, the sample mean is often used to 
estimate it. 

Because 


_ wo EX] 

= a 

_ n0 

= (7.1) 


it follows that X is an unbiased estimator of 6, where we say that an estimator 
of a parameter is an unbiased estimator of that parameter if its expected value is 
equal to the parameter. 

To determine the “worth” of X as an estimator of the population mean 6, 
we consider its mean square error—that is, the expected value of the squared 
difference between X and 6. Now 


EE- 6)?] = Var(X) (since E[X] = 8) 


= Var a $x) 


a Js any. ) (by independence) 


i=l 


z 
So (since Var(X;) = 0”) (1.2) 


Thus, X, the sample mean of the n data values X,,...,X,, is a random 
variable with mean @ and variance o?/n. Because a random variable is unlikely to 
be too many standard deviations—equal to the square root of its variance—from 
its mean, it follows that X is a good estimator of 0 when a/./n is small. 


Remark The justification for the above statement that a random variable is 
unlikely to be too many standard deviations away from its mean follows from 
both the Chebyshev inequality and, more importantly for simulation studies, 
from the central limit theorem. Indeed, for any c > 0, Chebyshev’s inequality 
(see Section 2.7 of Chapter 2) yields the rather conservative bound 
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However, when n is large, as will usually be the case in simulations, we can 
apply the central limit theorem to assert that (X — 0)/(a/./n) is approximately 
distributed as a standard normal random variable; and thus 


P{|X — 6|>co/./n} ~ P{|Z|>c}, where Z is a standard normal 
=21-(0)] (73) 


where ® is the standard normal distribution function. For example, since 
(1.96) = 0.975, Equation (7.3) states that the probability that the sample 
mean differs from 0 by more than 1.960/./n is approximately 0.05, whereas 
the weaker Chebyshev inequality only yields that this probability is less than 
1/(1.96) = 0.2603. o 


The difficulty with directly using the value of g?/n as an indication of how 
well the sample mean of n data values estimates the population mean is that the 
population variance o” is not usually known. Thus, we also need to estimate it. 
Since 


= E[(X —6)’] 


is the average of the square of the difference between a datum value and its 
(unknown) mean, it might seem upon using X as | the estimator of the mean that 
a natural estimator of a? would be X; (X;— xy /n, the average of the squared 
distances between the data values and the estimated mean. However, to make 
the estimator unbiased (and for other technical reasons) we prefer to divide the 
sum of squares by n — 1 rather than n. 


Definition The quantity S?, defined by 


i SRI 
S2 aes Diet (X; F. X) 
n—i 
is called the sample variance. 
Using the algebraic identity 
L-I = X -nr (14) 
i=l i=l 


whose proof is left as an exercise, we now show that the sample variance is an 
unbiased estimator of o”. 


Proposition 


E[S] =o? 
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Proof Using the identity (7.4) we see that 


i=l 


(n-1)E[S$]=E È x| —nE[X |] 


= nE[X?]—nE[X ] (7.5) 


where the last equality follows since the X; all have the same distribution. Recall- 
ing that for any random variable Y, Var(Y) = E[Y?]—(E[Y])° or, equivalently, 


E[Y?] = Var(¥) + (ELY)? 


we obtain that 
E [X}] = Var(X;) + (EIX, D? 
=0+4+0 
and 
E[X | = Var(X) + ER 
= a +@ [from (7.2) and (7.1)] 


Thus, from Equation (7.5), we obtain that 


(n—1)E[S?] = n(o? + 67) —n (= + e) =(n—1)c* 
which proves the result. o 


We use the sample variance S? as our estimator of the population variance o°, 
= we use S = +/S?, the so-called sample standard deviation, as our estimator 
of o. 

Suppose now that, as in a simulation, we have the option of continually 
generating additional data values X;. If our objective is to estimate the value of 
0 = E[X;], when should we stop generating new data values? The answer to this 
question is that we should first choose an acceptable value d for the standard 
deviation of our estimator—for if d is the standard deviation of the estimator 
X, then we can, for example, be 95% certain that X will not differ from 0 by 
more than 1.96d. We should then continue to generate new data until we have 
generated n data values for which our estimate of o/./n—namely, S/./n—is 
less than the acceptable value d. Since the sample standard deviation S may not 
be a particularly good estimate of o (nor may the normal approximation be valid) 
when the sample size is small, we thus recommend the following procedure to 
determine when to stop generating new data values. 


q 
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A Method for Determining When to Stop Generating New 
Data 


1. Choose an acceptable value d for the standard deviation of the estimator. 

2. Generate at least 100 data values. 

3. Continue to generate additional data values, stopping when you have gen- 
erated k values and S/Vk < d, where S is the sample standard deviation 
based on those k values. 

4. The estimate of 0 is given by X = X$; X;/k. 


Example 7a Consider a service system in which no new customers are 
allowed to enter after 5 P.M. Suppose that each day follows the same probability 
law and that we are interested in estimating the expected time at which the last 
customer departs the system. Furthermore, suppose we want to be at least 95% 
certain that our estimated answer will not differ from the true value by more 
than 15 seconds. 

To satisfy the above requirement it is necessary that we continually generate 
data values relating to the time at which the last customer departs (each time by 
doing a simulation run) until we have generated a total of k values, where k is 
at least 100 and is such that 1.96S//k < 15—where S is the sample standard 
deviation (measured in seconds) of these k data values. Our estimate of the 
expected time at which the last customer departs will be the average of the k 
data values. Oo 


In order to use the above technique for determining when to stop generating 
new values, it would be valuable if we had a method for recursively computing 
the successive sample means and sample variances, rather than having to recom- 
pute from scratch each time a new datum value is generated. We now show how 


this can be done. Consider the sequence of data values X}, X3,...; and let 
Ss, ek, 
X j = aera 
i=] J 
and 
; = \2 
i (X-X) 
S=) L, j22 


am J7 1 
denote, respectively, the sample mean and sample variance of the first j data 
values. The following recursion should be used to successively compute the 
current value of the sample mean and sample variance. 

With S? = 0, Xo =0, 


ie eh) Ca 
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j 1 eee ar 
Sia = (: _ =) S++ 1)(Xj44-X;) (7.7) 


Example 7b If the first three data values are X, = 5, X, = 14, X3 =9, then 
Equations (7.6) and (7.7) yield that 


T5 


The analysis is somewhat modified when the data values are Bernoulli (or 0, 
1) random variables, as is the case when we are estimating a probability. That 
1s, suppose we can generate random variables X, such that 


x= 1 with probability p 
' [O with probability 1—p 


and suppose we are interested in estimating E[X,] = p. Since, in this situation, 
Var(X;) = p(1— p) 


there is no need to utilize the sample variance to estimate Var(X;). Indeed, if 
we have generated n values X,,..., X,,, then as the estimate of p will be 
n 
Z=y 4 


n 
izni 7 


a natural estimate of Var(X;) is X,(1—X,,). Hence, in this case, we have the 
following method for deciding when to stop. 


1. Choose an acceptable value d for the standard deviation of the estimator. 

2. Generate at least 100 data values. 

3. Continue to generate - additional data values, stopping when you have gen- 
erated k values and [X,(1—X,)/k]!” < d. 

4. The estimate of p is X,, the average of the k data values. 
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Example 7c Suppose, in Example 7a, we were interested in estimating the 
probability that there was still a customer in the store at 5:30. To do so, we 
would simulate successive days and let 


x= 1 if there is a customer present at 5:30 on day i 
‘10 otherwise 

We would simulate at least 100 days and continue to simulate until the kth day, 
where k is such that [p,(1 —p,)/k]'” < d, where p, = X, is the proportion of 
these k days in which there is a customer present at 5:30 and where d is an 
acceptable value for the standard deviation of the estimator p,. o 


7.2 Interval Estimates of a Population Mean 


Suppose again that X,, X>,...,X,, are independent random variables from a 
common distribution having mean 6 and variance o”. Although the sample mean 
X = Z} X;/n is an effective estimator of 0, we do not really expect that X will 
be equal to @ but rather that it will be “close.” As a result, it is sometimes more 
valuable to be able to specify an interval for which we have a certain degree of 
confidence that @ lies within. 

To obtain such an interval we need the (approximate) distribution of the 
estimator X. To determine this, first recall, from Equations (7.1) and (7.2), that 


oe 2 


ER] =0, Va) =Z 
n 
and thus, from the central limit theorem, it follows that for large n 


(X 


A ~ N(0,1) 
(ex 


Jn 


where ~N(0, 1) means “is approximately distributed as a standard normal.” In 
addition, if we replace the unknown standard deviation ø by its estimator S, 
the sample standard deviation, then it still remains the case (by a result known 
as Slutsky’s theorem) that the resulting quantity is approximately a standard 
normal. That is, when n is large 


Vn(X — 6)/S~ N(O, 1) (7.8) 
Now for any a, 0 < @ < 1, let z, be such that 


P{Z>z,}=a 
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P{Z<—x} P{Z>x} 


-=x 0 x 
Figure 7.1. Standard normal density. 
where Z is a standard normal random variable. (For example, zo); = 1.96.) It 
follows from the symmetry of the standard normal density function about the 


origin that z,;_,, the point at which the area under the density to its right is equal 
to 1—«a, is such that (see Figure 7.1) 


Zi-a = Za 
Therefore (see Figure 7.1) 
Pl—Zap < Z < Zap} =la 
It thus follows from (7.8) that 


X0 
JETE = ) <a xl—a 


or, equivalently, upon multiplying by —1, 


0-X 
JETE r ) <a x1l-a 


which is equivalent to 


= S = S 
PLR tape <0<F teen] 1-a (7.9) 


In other words, with probability 1 — æ the population mean 6 will lie within the 
region X + Z4/25/./n. 


Definition Jf the observed values of the sample mean and the sample stan- 
dard deviation are X =x and S = s, call the interval X£ Zaps/ /n an (approxi- 
mate) 100(1 — æ) percent confidence interval estimate of 8. 


SSN TENE 
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Remarks 


1. To clarify the meaning of a “100(1 — a) percent confidence interval,” con- 
sider, for example, the case where a = 0.05, and so z, n= 1.96. Now before 
the data are observed, it will be true, with probability (approximately) equal 
to 0.95, that the sample mean X and the sample standard deviation S will 
be such that 0 will lie between X +1.96S/./n. After X and S are observed 
to equal, respectively, ¥ and s, there is no longer any probability concerning 
whether 6 lies in the interval ¥+1.96s/./n, for either it does or it does not. 
However, we are “95% confident” that in this situation it does lie in this 
interval (because we know that over the long run such intervals will indeed 
contain the mean 95 percent of the time). 

2. (A technical remark.) The above analysis is based on Equation (7.8), which 
states that ./n(X — 0)/S is approximately a standard normal random variable 
when 7 is large. Now if the original data values X, were themselves normally 
distributed, then it is known that this quantity has (exactly) a ¢-distribution 
with n — 1 degrees of freedom. For this reason, many authors have proposed 
using this approximate distribution in the general case where the original 
distribution need not be normal. However, since it is not clear that the t- 
distribution with n — 1 degrees of freedom results in a better approximation 
than the normal in the general case, and because these two distributions are 
approximately equal for large n, we have used the normal approximation 
rather than introducing the t-random variable. o 


Consider now the case, as in a simulation study, where additional data values 
can be generated and the question is to determine when to stop generating new 
data values. One solution to this is to initially choose values œ and / and to 
continue generating data until the approximate 100(1 — œ) percent confidence 
interval estimate of @ is less than J. Since the length of this interval will be 


. 2Z4/25/./n we can accomplish this by the following technique. 


1. Generate at least 100 data values. 

2. Continue to generate additional data values, stopping when the number of 
values you have generated—call it k—is such that 2z,,.5S/ Vk < l, where S 
is the sample standard deviation based on those k values. [The value of S 
should be constantly updated, using the recursion given by (7.6) and (7.7), 
as new data are generated.| 

3. If ¥ and s are the observed values of X and S, then the 100(1 — æ) percent 
confidence interval estimate of 0, whose length is less than J, is ¥£Z«/2$/ Vk. 


A Technical Remark The more statistically sophisticated reader might 
wonder about our use of an approximate confidence interval whose theory was 
based on the assumption that the sample size was fixed when in the above 
situation the sample size is clearly a random variable depending on the data 
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values generated. This, however, can be justified when the sample size is large, 
and so from the viewpoint of simulation we can safely ignore this subtlety. O 


As noted in the previous section, the analysis is modified when X,,..., X, 
are Bernoulli random variables such that 


X, = 


t 


1 with probability p 
0 with probability 1 — p 


Since in this case Var(X;) can be estimated by X(1 — X), it follows that the equiv- 
. alent statement to Equation (7.8) is that when n is large 


Xs 
a-EB ano, 1) (7.10) 
A(1—X) 
Hence, for any a, 
X 
: P —Zap < vn ( P) < Ze/2 =l-—a 


of X=) 


or, equivalently, 


P [Z-znyza —X)/n<p<X+zany X(1 -3y/n} =l-a 


Hence, if the observed value of X is p,, we say that the “100(1 — œ) percent 
confidence interval estimate” of p is 


Pn Zap Pn(1—Dy)/n 


7.3 The Bootstrapping Technique for Estimating Mean 
Square Errors 


Suppose now that X,,..., X,, ate independent random variables having a com- 
mon distribution function F, and suppose we are interested in using them to 
estimate some parameter 0(F) of the distribution F. For example, 6(F) could 
be (as in the previous sections of this chapter) the mean of F, or it could be the 
median or the variance of F, or any other parameter of F. Suppose further that 
an estimator of 6(F)—call it g(X,,..., X„)—has been proposed, and in order 
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to judge its worth as an estimator of 0(F) we are interested in estimating its 
mean square error. That is, we are interested in estimating the value of 


MSE(F) = E,[(g(X,,..-X,) — O(F))’] 


[where our choice of notation MSE(F) suppresses the dependence on the estima- 


‘tor g, and where we have used the notation Ep to indicate that the expectation is 


to be taken under the assumption that the random variables all have distribution 
F]. Now whereas there is an immediate estimator of the above MSE—namely, 
S? /n—when 0(F) = E[X,] and g(X,,...,X,) = X, it is not at all that apparent 
how it can be estimated otherwise. We now present a useful technique, known 
as the bootstrap technique, for estimating this mean square error. 

To begin, note that if the distribution function F were known then we could 
theoretically compute the expected square of the difference between 0 and its 
estimator; that is, we could compute the mean square error. However, after we 
observe the values of the n data points, we have a pretty good idea what the 
underlying distribution looks like. Indeed, suppose that the observed values of the 
data are X; = x;,i=1,...,n. We can now estimate the underlying distribution 
function F by the so-called empirical distribution function F,, where F(x), the 
estimate of F(x), the probability that a datum value is less than or equal to x, is 
just the proportion of the n data values that are less than or equal to x. That is, 


r= number of i: X; <x 

n 
Another way of thinking about F, is that it is the distribution function of a random 
variable X, which is equally likely to take on any of the n values x;,i=1,..., n. 
(If the values x; are not all distinct, then the above is to be interpreted to mean 
that X, will equal the value x; with a probability equal to the number of j such 


_ that x; = x; divided by n; that is, if n = 3 and x, = x = 1, x, = 2, then X, 


is a random variable that takes on the value 1 with probability A and 2 with 
probability 4.) 

Now if F, is “close” to F, as it should be when n is large [indeed, the strong 
law of large numbers implies that with probability 1, F,(x) converges to F(x) 
as n —> oo, and another result, known as the Glivenko—Cantelli theorem, states 
that this convergence will, with probability 1, be uniform in x], then 6(F,) will 
probably be close to 6(F)—assuming that 0 is, in some sense, a continuous 
function of the distribution—and MSE(F) should approximately be equal to 


MSE(F,) = Er (8X1; - - - Xn) — (F.))"] 
In the above expression the X; are to be regarded as being independent random 


variables having distribution function F,. The quantity MSE(F,) is called the 
bootstrap approximation to the mean square error MSE(F). 


| 
| 
| 
! 
i 
3 
| 
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To obtain a feel for the effectiveness of the bootstrap approximation to the 
mean square error, let us consider the one case where its use is not necessary— 
namely, when estimating the mean of a distribution by the sample mean X: (Its 
use is not necessary in this case because there already is an effective way of 
estimating the mean square error E[(X — 6)?] = o?/n—namely, by using the 
observed value of S?/n.) 


Example 7d Suppose we are interested in estimating 0(F) = E[X] by using 
the sample mean X = Ð$, X,/n. If the observed data are x,,i=1,...,n, then 
the empirical distribution F, puts weight 1/n on each of the points x,,...,x, 
(combining weights if the x; are not all distinct). Hence the mean of F, is 
O(F,) =X = D7, x;/n, and thus the bootstrap estimate of the mean square 
error—call it MSE(F,)—is given by 


MSE(F,) = Er, (È à -x) 
P n 


where X,,..., X,, are independent random variables each distributed according 
to F,. Since 


Er, > x| = Er [X] =¥ 


it follows that 


i=l 


MSE(F,) = Varp, (È z) 


__ Varr, (X) 
i n 
Now 
Varp, (X) = Ep [(X — Er [XD] 
=> Er [(X —x)y ] 
=: be -3) i 
and so 


MSE(F,) mal Det (x; -77 


n? 
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which compares quite nicely with S/n, the usual estimate of the mean square 
error. Indeed, because the observed value of S?/n is Xi (x; — 7) /[n(n — D} 
the bootstrap approximation is almost identical. 


If the data values are X; = x;,i=1,...,m, then, as the empirical distribution 
function F, puts weight 1/n on each of the points x;, it is usually easy to 


` compute the value of O(F,): for example, if the parameter of interest 6(F) was 


the variance of the distribution F, then 0(F,) = Var; (X) = VG —X)?/n. To 
determine the bootstrap approximation to the mean square error we then have to 
compute 


MSE(F,) = Er (8X1 - +- » X,) — 0(F.))?I 


However, since the above expectation is to be computed under the assumption 
that X,,..., X„ are independent random variables distributed according to F,, it 
follows that the vector (X,,..., X„) is equally likely to take on any of the n” 
possible values (x;, x paces X)» i; e {1,2,... n}, j=1,..., n. Therefore, 


x;,) ae OF)? 


nn 


MSE(F,) =} -0 laruin 


in i 


where each i; goes from 1 to n, and so the computation of MSE(F,) requires, 
in general, summing n” terms—an impossible task when n is large. 

However, as we know, there is an effective way to approximate the average of 
a large number of terms, namely, by using simulation. Indeed, we could generate 
a set of n independent random variables X!,..., X! each having distribution 
function F, and then set 


Y, =[g(X1,.-.. X4) - OFT 


` Next, we generate a second set X?,..., X2 and compute 


Y, = [g (X?,...,X2) -—O(F.)] 


and so on, until we have collected the variables Y,, Y,,..., Y,. Because these Y, 
are independent random variables having mean MSE(F.,), it follows that we can 
use their average )°7_, Y,/r as an estimate of MSE(F,). 


Remarks 


1. It is quite easy to generate a random variable X having distribution F,. 
Because such a random variable should be equally likely to be x,,...,x,, 
just generate a random number U and set X = x,, where J = Int(nU) +1. 
(It is easy to check that this will still work even when the x; are not all 
distinct.) 
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2. The above simulation allows us to approximate MSE(F,), which is itself 
an approximation to the desired MSE(F). As such, it has been reported 
that roughly 100 simulation runs—that is, choosing r = 100—is usually 
sufficient. o 


The following example illustrates the use of the bootstrap in analyzing the 
output of a queueing simulation. 


Example 7e Suppose in Example 7a that we are interested in estimating 
the long-run average amount of time a customer spends in the system. That is, 
letting W; be the amount of time the ith entering customer spends in the system, 
i> 1, we are interested in 


p=, te 


n> n 


To show that the above limit does indeed exist (note that the random variables 
W, are neither independent nor identically distributed), let N, denote the number 
of customers that arrive on day i, and let 


D, =W, ++ Wy, 
Dy = Wy +++ Wyn, 


and, in general, for i > 2, 
D; = Wy, 4-tN;_y41 +-+ Wy, +--+; 


In words, D; is the sum of the times in the system of all arrivals on day i. We 
can now express @ as 


6= lim Dı +Di+--+Dm 
m> Ni +N, +---+N,, 


where the above follows because the ratio is just the average time in the system 
of all customers arriving in the first m days. Upon dividing numerator and 
denominator by m, we obtain i 


jaa (itt Dm) 
mow (N, +-+ N,)/m 


Now as each day follows the same probability law, it follows that the random 
variables D}, ..., Dm are all independent and identically distributed, as are the 
random variables N,,...,N,,. Hence, by the strong law of large numbers, it 
follows that the average of the first m of the D, will, with probability 1, converge 
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to their common expectation, with a similar statement being true for the N,. 
Therefore, we see that 


zap] 
-EM 


„where E[N] is the expected number of customers to arrive in a day, and E[D] is 


the expected sum of the times those customers spend in the system. 

To estimate @ we can thus simulate the system over k days, collecting on the 
ith run the data N;, D;, where N; is the number of customers arriving on day i 
and D; is the sum of the times they spend in the system, i=1,...,k. Because 
the quantity E[D] can then be estimated by 


D,+D,+-+-+D, 
k 


and E[N] by 


W= Mimi tN 


it follows that 9 = E[D]/E[N] can be estimated by 


Estimate of 0 = 2 = Pibe to 
N N,+:--+N, 


which, it should be noted, is just the average time in the system of all arrivals 


during the first k days. 
To estimate 


we employ the bootstrap approach. Suppose the observed value of D;, N; is 
d;,n;,i=1,...,k. That is, suppose that the simulation resulted in n; arrivals 
on day i spending a total time d; in the system. Thus, the empirical joint 
distribution function of the random vector D, N puts equal weight on the k pairs 
d;,n;,i=1,...,k. That is, under the empirical distribution function we have 


1 
Pr {D=d,,N=n} ==> 


7 fH Tl. k 
k L 


Hence, 
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and thus, 
(F) = 4 
n 
Hence, 
5t D; d i 
MSE(F,) = Er, (ža — 3) 


where the above is to be computed under the assumption that the k pairs of 
random vectors D;, N; are independently distributed according to F,. 

Since an exact computation of MSE(F,) would require computing the sum 
of k* terms, we now perform a simulation experiment to approximate it. We 
generate k independent pairs of random vectors D}, N}, i=1,...,k, according 
to the empirical distribution function F,, and then compute 


We then generate a second set D?, N? and compute the corresponding Y,. This 
continues until we have generated the r values Y,,..., Y, (where r = 100 should 
suffice). The average of these r values, }>;_, Y,/r, is then used to estimate 
MSE(F,), which is itself our estimate of MSE, the mean square error of our 
estimate of the average amount of time a customer spends in the system. o 


Remark TheRegenerativeApproach The foregoing analysis assumed 
that each day independently followed the same probability law. In certain appli- 
cations, the same probability law describes the system not over days of fixed 
lengths but rather over cycles whose lengths are random. For example, consider 
a queueing system in which customers arrive in accordance with a Poisson pro- 
cess, and suppose that the first customer arrives at time 0. If the random time 
T represents the next time that an arrival finds the system empty, then we say 
that the time from 0 to T constitutes the first cycle. The second cycle would 
be the time from T until the first time point after T that an arrival finds the 
system empty, and so on. It is easy to see, in most models, that the move- 
ments of the process over each cycle are independent and identically distributed. 
Hence, if we regard a cycle as being a “day,” then all of the preceding analysis 
remains valid. For example, 8, the amount of time that a customer spends in the 
system, is given by 0 = E[D]/E[N], where D is the sum of the times in the system 
of all arrivals in a cycle and N is the number of such arrivals. If we now generate k 
cycles, our estimate of @ is still ye D,/ yy N,. In addition, the mean square error 
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of this estimate can be approximated by using the bootstrap approach exactly as 
above. 

The technique of analyzing a system by simulating “cycles,” that is, random 
intervals during which the process follows the same probability law, is called 
the regenerative approach. 


Exercises 


1. For any set of numbers x,,..., x,, prove algebraically that 


n n 


Yo (x; —X)* = x? — nx" 


i=l i=l 
where ¥ = } `; x;/n. 


2. Give a probabilistic proof of the result of Exercise 1, by letting X denote 
a random variable that is equally likely to take on any of the values x,,...,X,, 
and then by applying the identity Var(X) = E[X?] — (E[X])?. 


3. Write a program that uses the recursions given by Equations (7.6) and (7.7) 
to calculate the sample mean and sample variance of a data set. 


4. Continue to generate standard normal random variables until you have gen- 
erated n of them, where n > 100 is such that S/./n < 0.1, where S is the sample 
standard deviation of the n data values. 


(a) How many normals do you think will be generated? 

(b) How many normals did you generate? 

(c) What is the sample mean of all the normals generated? 

(d) What is the sample variance? 

(e) Comment on the results of (c) and (d). Were they surprising? 


5. Repeat Exercise 4 with the exception that you now continue generating 
standard normals until S/./n < 0.01. 


6. Estimate i exp(x”) dx by generating random numbers. Generate at least 
100 values and stop when the standard deviation of your estimator is less than 
0.01. 


7. Toestimate E[X], X,, ..., X,5 have been simulated with the following values 
resulting: 10, 11, 10.5, 11.5, 14, 8, 13, 6, 15, 10, 11.5, 10.5, 12, 8, 16, 5. Based 
on these data, if we want the standard deviation of the estimator of E[X] to be 
less than 0.1, roughly how many additional simulation runs will be needed? 


Exercises 8 and 9 are concerned with estimating e. 
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8. It can be shown that if we add random numbers until their sum exceeds 1, 
then the expected number added is equal to e. That is, if 


i=l 


w=min fa:5u>1) 


then E[N] = e. 


(a) Use this preceding to estimate e, using 1000 simulation runs. 
(b) Estimate the variance of the estimator in (a) and give a 95 percent 
confidence interval estimate of e. 


9. Consider a sequence of random numbers and let M denote the first one that 
is less than its predecessor. That is, 


M =min{n: U, < U, < +-+ < Upi > Un} 


(a) Argue that P{M > n} = ea n>=0. 

(b) Use the identity E[M] = Lied P{M > n} to show that E[M] = e. 

(c) Use part (b) to estimate e, using 1000 simulation runs. 

(d) Estimate the variance of the estimator in (c) and give a 95 percent 
confidence interval estimate of e. 


10. Use the approach that is presented in Example 3a of Chapter 3 to obtain an 
interval of size less than 0.1, which we can assert, with 95 percent confidence, 
contains 7. How many runs were necessary? 


11. Repeat Exercise 10 when we want the interval to be no greater than 0.01. 


12. To estimate 0, we generated 20 independent values having mean 0. If the 
successive values obtained were 


102, 112, 131, 107, 114, 95, 133, 145, 139, 117 
93, 111, 124, 122, 136, 141, 119, 122, 151, 143 


how many additional random variables do you think we will have to generate if 
we want to be 99 percent certain that our final estimate of 6 is correct to within 
+0.5? 


13. Let X,,...,X, be independent and identically distributed random vari- 
ables having unknown mean y. For given constants a < b, we are interested in 
estimating p = P{a < Pi X;/n— h < b}. 


(a) Explain how we can use the bootstrap approach to estimate p. 
(b) Estimate p if n = 10 and the values of the X; are 56, 101, 78, 67, 93, 
87, 64, 72, 80, and 69. Take a = —5, b = 5. 
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In the following three eee X,...,X,, is a sample from a distribution whose 
variance is m unknown) a”. We are re planning to estimate a? by the sample 
variance S? = Lig (Xi — —X)?/(n—1), and we want to use the bootstrap technique 
to estimate Var( S? ). 


14. Ifn=2 and X, =1 and X, =3, what is the bootstrap estimate of Var(S?)? 
15. Ifn=15 and the data are 


5, 4, 9, 6, 21, 17, 11, 20, 7, 10, 21, 15, 13, 16, 8 


approximate (by a simulation) the bootstrap estimate of Var(S7). 


16. Consider a single-server system in which potential customers arrive in 
accordance with a Poisson process having rate 4.0. A potential customer will 
only enter if there are three or fewer other customers in the system when he 
or she arrives. The service time of a customer is exponential with rate 4.2. No 
additional customers are allowed in after time T = 8. (All time units are per 
hour.) Develop a simulation study to estimate the average amount of time that an 
entering customer spends in the system. Using the bootstrap approach, estimate 
the mean square error of your estimator. 
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Variance Reduction 
Techniques 


Introduction 


In a typical scenario for a simulation study, one is interested in determining 0, 
a parameter connected with some stochastic model. To estimate 6, the model is 
simulated to obtain, among other things, the output datum X which is such that 
0 = E[X]. Repeated simulation runs, the ith one yielding the output variable X,, 
are performed. The simulation study is then terminated when n runs have been 
performed and the estimate of 6 is given by X = J-i, X;/n. Because this results 
in an unbiased estimate of 6, it follows that its mean square error is equal to its 
variance. That is, 


MSE = E[(X — 07] = Var(X) = va) 
Hence, if we can obtain a different unbiased estimate of 0 having a smaller 

variance than does X, we would obtain an improved estimator. 

In this chapter we present a variety of different methods that one can attempt 
to use so as to reduce the variance of the (so-called raw) simulation estimate X. 

However, before presenting these variance reduction techniques, let us illus- 
trate the potential pitfalls, even in quite simple models, of using the raw simu- 
lation estimator. 


Example 8a Quality Control Consider a process that produces items 
sequentially. Suppose that these items have measurable values attached to them 
and that when the process is “in control” these values (suitably normalized) 
come from a standard normal distribution. Suppose further that when the process 
goes “out of control” the distribution of these values changes from the standard 
normal to some other distribution. 
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To help detect when the process goes out of control the following type of 
procedure, called an exponentially weighted moving-average control rule, is 
often used. Let X,, X,,... denote the sequence of data values. For a fixed value 
a,0 <a <1, define the sequence S,, > 0, by 


So =0 
§,=aS, ,+U—a)X,, n21 


Now when the process is in control, all the X, have mean 0, and thus it is easy 
to verify that, under this condition, the exponentially weighted moving-average 
values S, also have mean 0. The moving-average control rule is to fix a constant 
B, along with the value of œ, and then to declare the process “out of control” 
when |S,,| exceeds B. That is, the process is declared out of control at the random 
time N, where 


N = Min{n: |S,| > B} 


Now it is clear that eventually |S„| will exceed B and so the process will be 
declared out of control even if it is still working properly—that is, even when 
the data values are being generated by a standard normal distribution. To make 
sure that this does not occur too frequently, it is prudent to choose œ and B so 
that, when the X,,, n > 1, are indeed coming from a standard normal distribution, 
E[N] is large. Suppose that it has been decided that, under these conditions, a 
value for E[N] of 800 is acceptable. Suppose further that it is claimed that the 
values a = 0.9 and B = 0.8 achieve a value of E[N] of around 800. How can 
we check this claim? 

One way of verifying the above claim is by simulation. Namely, we can 
generate standard normals X,,n > 1, until |S,| exceeds 0.8 (where œ = 0.9 in 
the defining equation for S,). If N, denotes the number of normals needed until 
this occurs, then, for our first simulation run, we have the output variable N,. 
We then generate other runs, and our estimate of E[N] is the average value of 
the output data obtained over all runs. 

However, let us suppose that we want to be 99 percent confident that our 
estimate of E[N], under the in-control assumption, is accurate to within +0.1. 
Hence, since 99 percent of the time a normal random variable is within +2.58 
standard deviations of its mean (i.e., Z 995; = 2.58), it follows that the number of 
runs needed—call it n—is such that 


where g, is the sample standard deviation based on the first n data values. Now 
c, will approximately equal o(N), the standard deviation of N, and we now 


rene an 
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argue that this is approximately equal to E[N]. The argument runs as follows: 
Since we are assuming that the process remains in control throughout, most of 
the time the value of the exponentially weighted moving average is near the 
origin. Occasionally, by chance, it gets large and approaches, in absolute value, 
B. At such times it may go beyond B and the run ends, or there may be a 
string of normal data values which, after a short time, eliminate the fact that 
the moving average had been large (this is so because the old values of $; are 
continually multiplied by 0.9 and so lose their effect). Hence, if we know that 
the process has not yet gone out of control by some fixed time k, then, no 
matter what the value of k, it would seem that the value of S, is around the 
origin. In other words, it intuitively appears that the distribution of time until the 
moving average exceeds the control limits is approximately memoryless; that 
is, it is approximately an exponential random variable. But for an exponential 
random variable Y, Var(Y ) = (E[Y]). Since the standard deviation is the square 
root of the variance, it thus seems intuitive that, when in control throughout, 
a(N) © E[N]. Hence, if the original claim that E[N] ~ 800 is correct, the number 
of runs needed is such that 


Jn © 25.8 x 800 
or 
n œ (25.8 x 800)? ~ 4.26 x 108 


In addition, because each run requires approximately 800 normal random vari- 
ables (again assuming the claim is roughly correct), we see that to do this 
simulation would require approximately 800 x 4.26 x 108 ~ 3.41 x 10"! normal 
random variables—a formidable task. oO 
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Suppose we are interested in using simulation to estimate 0 = E[X] and suppose 
we have generated X, and X,, identically distributed random variables having 
mean 6. Then 


Var 55) = 1 [Var (X1) + Var(X) +2Cov(X,, X)] 


Hence it would be advantageous (in the sense that the variance would be reduced) 
if X, and X, rather than being independent were negatively correlated. 

To see how we might arrange for X, and X, to be negatively correlated, 
suppose that X, is a function of m random numbers: that is, suppose that 


X, =A(U,, Uy, ..., Un) 


| 
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where U,,..., Um are m independent random numbers. Now if U is a random 
number—that is, U is uniformly distributed on (0, 1)—then so is 1 — U. Hence 
the random variable - 


X, =k =U 1— U... ,1— Up) 


has the same distribution as X,. In addition, since 1—U is clearly negatively 
correlated with U, we might hope that X, might be negatively correlated with 
X,; and indeed that result can be proved in the special case where h is a 
monotone (either increasing or decreasing) function of each of its coordinates. 
[This result follows from a more general result which states that two increasing 
(or decreasing) functions of a set of independent random variables are positively 
correlated. Both results are presented in the Appendix to this chapter.] Hence, in 
this case, after we have generated U,,..., Up so as to compute X,, rather than 
generating a new independent set of m random numbers, we do better by just 
using the set 1—U,,...,1—U,, to compute X,. In addition, it should be noted 
that we obtain a double benefit: namely, not only does our resulting estimator 
have smaller variance (at least when h is a monotone function), but we are also 
saved the time of generating a second set of random numbers. 


Example 8b Simulating the Reliability Function Consider a system 
of n components, each of which is either functioning or failed. Letting 


2 1 if component i works 
‘10 otherwise 


we calls =(s,,...,5,) the state vector. Suppose also that there is a nondecreas- 
ing function (s4, . - - , Sn) such that 
$ 33 1 if the system works under state vector s,,...,5, 
ee = : 
i ý 0 otherwise 


The function (s,,...,5,) is called the structure function. 
Some common structure functions are the following: 
(a) The series structure: For the series structure 


$(s1.-+-+5,) = Mins, 
L 


The series system works only if all its components function. 
(b) The parallel structure: For the parallel structure 


$(5,.--+5,) = Maxs; 
t 


Hence the parallel system works if at least one of its components works. 


8.1 The Use of Antithetic Variables 141 
(c) The k-of-n system: The structure function 


_ fl tyes ek 
POPs t= i otherwise 
is called a k-of-n structure function. Since J`; s; represents the number 
of functioning components, a k-of-n system works if at least k of the n 
components are working. 
It should be noted that a series system is an n-of-n system, whereas a 
parallel system is a 1-of-n system. 
(d) The bridge structure: A five-component system for which 


P(S15 S2 535845 55) = Max(s1 5355, 5953545 5154s 5255) 


is said to have a bridge structure. Such a system can be represented schemat- 
ically by Figure 8.1. The idea of the diagram is that the system functions 
if a signal can go, from left to right, through the system. The signal can 
go through any given node i provided that component i is functioning. We 
leave it as an exercise for the reader to verify the formula given for the 
bridge structure function. 


Let us suppose now that the states of the components—call them S,—i = 
1,...m, are independent random variables such that 


P{S; =1}=p,=1-—P{S,=0} i=1,...,n 


MPi +- -> Pa) = PUPS), -+ -> S) = 1) 
= E[@(S;,..-.,5,)] 


The function r(p,,...,p,) is called the reliability function. It represents the 


probability that the system will work when the components are independent with 
component į functioning with probability p,,i=1,...,n. 


Figure 8.1. The bridge structure. 
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For a series system 


T(Pis-- -> Pa) = P{S;=1 for alli=1,...,7} 


=[]Pts,=1) 


i=l 
=[[2; 
i=l 
and for a parallel system 
r(DPy,-++>Pn) = P{S;=1 for at least one i,i=1,..., 7} 
=1-—P{S,=0 foralli=1,...,n} 


=1-]]P(s, =0) 
i=] 
=1-T]a-p) 


However, for- most systems it remains a formidable problem to compute the 
reliability function (even for such small systems as a 5-of-10 system or the 
bridge system it can be quite tedious to compute). So let us suppose that for a 
given nondecreasing structure function ¢ and given probabilities p,,..., Pn» We 
are interested in using simulation to estimate 


1(Pys+ +++ Pa) = E[O(S;,--- + Sa)] 


Now we can simulate the S; by generating uniform random numbers U,,..., Un 
and then setting 


$= 1 if U <p 
i— |O otherwise 


Hence we see that 
o(s eae s Sm) =h(U,, ees U,,) 
where h is a decreasing function of U,,..., Up- Therefore 


Cov(h(U), h(1—U)) <0 


and so the antithetic variable approach of using U,,...,U, to generate both 
h(U,,...,U,) and h(1—U,,...,1—U,) results in a smaller variance than 
if an independent set of random numbers were used to generate the second 
value of h. o 
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Oftentimes the relevant output of a simulation is a function of the input random 
variables Y,,..., Y,,. That is, the relevant output is X = h(Y,,..., Y,,). Suppose 
Y, has distribution F;, i = 1, . . . , m. If these input variables are generated by the 
inverse transform technique, we can write 


X =h (F7 (U), . . - , Fa (Un)) 


where U;,...,U,, are independent random numbers. Since a distribution func- 
tion is increasing, it follows that its inverse is also increasing and thus if 
h(Yi» - - -s Ym) Were a monotone function of its coordinates, then it follows that 
h(FT (U), -.., Fa (Un)) will be a monotone function of the U;. Hence the 
method of antithetic variables, which would first generate U,,..., U,, to com- 
pute X, and then use 1—U,,...,1—U,, to compute X,, would result in an 
estimator having a smaller variance than would have been obtained if a new set 
of random numbers were used for X). 


Example 8c Simulating a Queueing System Consider a given 
queueing system, let D; denote the delay in queue of the ith arriving customer, 
and suppose we are interested in simulating the system so as to estimate 0 = E[X], 
where 


X=D,+---+D, 


is the sum of the delays in queue of the first n arrivals. Let [,,..., 1, denote 
the first n interarrival times (i.e., Z, is the time between the arrivals of customers 
j—1 and j), and let S,,..., S, denote the first n service times of this system, and 
suppose that these random variables are all independent. Now in many systems 
X is a function of the 27 random variables. [,,...,1,,5,,...,S,. Say, 


X=h(la. e, Lp See S) 


Also, as the delay in queue of a given customer usually increases (depending 
of course on the specifics of the model) as the service times of other cus- 
tomers increase and usually decreases as the times between arrivals increase, 
it follows that, for many models, A is a monotone function of its coordinates. 
Hence, if the inverse transform method is used to generate the random variables 
f,...,1,,5;,...,85,, then the antithetic variable approach results in a smaller 
variance. That is, if we initially use the 2n random numbers U;,i=1,...,2n, 
to generate the interarrival and service times by setting I; = F;'(U,), S; = 
G;'(U,4;), where F, and G; are, respectively, the distribution functions of J; and 
S;, then the second simulation run should be done in the same fashion, but using 
the random numbers 1 — U;, i = 1,...,2n. This results in a smaller variance 
than if a new set of 2n random numbers were generated for the second run. O 


l 
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The following example illustrates the sort of improvement that can sometimes 
be gained by the use of antithetic variables. 


Example 8d Suppose we were interested in using simulation to estimate 
1 
0 = E|] = f e dx 
0 


(Of course, we know that 0 = e — 1; however, the point of this example is to 
see what kind of improvement is possible by using antithetic variables.) Since 
the function A(u) = e" is clearly a monotone function, the antithetic variable 
approach leads to a variance reduction, whose value we now determine. To 
begin, note that 


Cov(e", eY) = Eļet e't] — Efe" JE[e7] 
= e — (e — 1} = —0.2342 
Also, because 
Var(e”) = Ele”) — (Ele)? 
=f e dx—(e—1)} 


24 
=£ = — (e—1} =0.2420 


we see that the use of independent random numbers results in a variance of 


U 
Var (ee) = Yate) -0.1210 


whereas the use of the antithetic variables U and 1—U gives a variance of 


U4 ol-U U U „1-U 
var (< te \-=5 ) , Sevens ) 0.0039 


a variance reduction of 96.7 percent. Oo 


Example 8e Estimating e Consider a sequence of random numbers and 
let N be the first one that is greater than its immediate predecessor. That is, 


N=min(n:n>2,U, > Upa) 
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Now, 
P{N>n} = P{U, > U,>--->U,} 
=1/n! 
where the final equality follows because all possible orderings of U,,...,U,, 


-are equally likely. Hence, 


1 1 n-l 


PN oti ie etl INS BP yp an 


and so 
2 1 
Me ea 
Also, 
a ow A oOo 2 =< n—2 
A= Gai = & Gani TX Gay 


= 1 
a 


n=3 


and so 
Var(N) = 3e — e ~ 0.7658 


Hence, e can be estimated by generating random numbers and stopping the first 
time one exceeds its immediate predecessor. 
If we employ antithetic variables, then we could also let 


M = min(n: n > 2, 1 — U, > 1— U,1) = min(n : n > 2, U, <U,_)) 


Since one of the values of N and M will equal 2 and the other will exceed 2, 
it would seem, even though they are not monotone functions of the U,, that the 
estimator (N + M)/2 should have a smaller variance than the average of two 
independent random variables distributed according to N. Before determining 
Var(N + M), it is useful to first consider the random variable N,, whose dis- 
tribution is the same as the conditional distribution of the number of additional 
random numbers that must be observed until one is observed greater than its 
predecessor, given that U, < U,. Therefore, we may write 


N=2, with probability ; 


N=2+N,, with probability ` 


146 8 Variance Reduction Techniques 


Hence, | 
1 
E[N] = 2+ 5 EIN. | 
Poa 
E[N?|= ang 72+ No)’ | 
1 
=4+42E[N,]+ zE [Ne] 

Using the previously obtained results for E[N] and Var(N) we obtain, after 
some algebra, that f 
E[N, |] =2e—4 
E[N?] =8—2e | 


implying that 


Var(N,) = 14e — 4e — 8 = 0.4997 


Now consider the random variable N and M. It is easy to see that after the 
first two random numbers are observed, one of N and M will equal 2 and the 
other will equal 2 plus a random variable that has the same distribution as N,. 
Hence, 


Var(N + M) = Var(4+-N,) = Var(N,) 
Hence, 


Var(N,-+N,) _ 1.5316 _ 
Var( N+M) 0.4997 


f 
Thus, the use of antithetic variables reduces the variance of the estimator by a 
factor of slightly more than 3. o | 


In the case of a normal random variable having mean p and variance a”, we 
can use the antithetic variable approach by first generating such a random variable 
Y and then taking as the antithetic variable 24. — Y, which is also normal with 
mean u and variance o” and is clearly negatively correlated with Y. If we were 
using simulation to compute E[A(Y,,...,Y,)], where the Y; are independent 
normal random variables with means p;,i=1,...,n, and h is a monotone =: 
function of its coordinates, then the antithetic approach of first generating the at 
n normals ¥,,...,¥, to compute A(Y,,...,Y,) and then using the antithetic 
variables 2u;—Y,,i=1,...,m, to compute the next simulated value of h would 
lead to a reduction in variance as compared with generating a second set of n 
normal random variables. 
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8.2 The Use of Control Variates 


Again suppose that we want to use simulation to estimate 0 = E[X], where X is 
the output of a simulation. Now suppose that for some other output variable Y, 
the expected value of Y is known—say, E[Y] = ,. Then for any constant c, the 
quantity 
X+c(Y —py) 

is also an unbiased estimator of 0. To determine the best value of c, note that 

Var(X + c(Y¥ — u,)) = Var(X + cY) 

= Var(X) +c? Var(¥) + 2cCov(X, Y) 


Simple calculus now shows that the above is minimized when c = c*, where 


Cov(X, Y) 
= -—— 8.1 
Var(Y) Se) 
and for this value the variance of the estimator is 
Cov(X, Y)P 
Var(X + (Y — Hy)) = Var(X) — Cwn (8.2) 


Var(Y) 


The quantity Y is called a control variate for the simulation estimator X. To 
see why it works, note that c* is negative (positive) when X and Y are positively 
(negatively) correlated. So suppose that X and Y were positively correlated, 
meaning, roughly, that X is large when Y is large and vice versa. Hence, if a 


-simulation run results in a large (small) value of Y—which is indicated by Y 


being larger (smaller) than its known mean 4,—then it is probably true that X 
is also larger (smaller) than its mean 0, and so we would like to correct for this 
by lowering (raising) the value of the estimator X, and this is done since c* 
is negative (positive). A similar argument holds when X and Y are negatively 
correlated. 

Upon dividing Equation (8.2) by Var(X), we obtain that 


Var(X +0*(¥ — Hy) 


Var(X) = 1—Cor’(X, Y) 


where 


Cov(X, Y) 


Naa) 


Corr(X, Y) = 
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is the correlation between X and Y. Hence, the variance reduction obtained in 
using the control variate Y is 100 Corr?(X, Y) percent. 

The quantities Cov(X, Y) and Var(Y) are usually not known in advance and 
must be estimated from the simulated data. If n simulation runs are performed, 
and the output data X;, ¥;,i=1,...,m, result, then using the estimators 


COU, Y) =E- -P/a 1) 


and 


Var(¥) = -FP/0- 1, 


we can approximate c* by ĉ*, where 


v Dak- DU- 


OPEET TATE TETEE RTEA 


i=l (Y; -Yy 
The variance of the controlled estimator 
AEN es Lig, x) Cov’ (X, 2) 


can then be estimated by using the estimator of Cov(X, Y) along with the sample 
variance estimators of Var(X) and Var(Y). 


Remark Another way of doing the computations is to make use of a standard — | 
computer package for simple linear regression models. For if we consider the | 
simple linear regression model | 


X=a+bY+e - i 
where e is a random variable with mean 0 and variance o”, then â and b, the - l 
least squares estimators of a and b based on the data X;, Y, i= 1,...,n, are 
b= iat (X;— X)(¥;—¥) | 
Die (Y; — Yy 
a=X—bY 


Therefore, b = —2*. In addition, since 


X+e(F-p,) =X-b—p,) 
=4+bp 


Se 


UG RTE RHE 
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it follows that the control variate estimate is the evaluation of the estimated 
regression line at the value Y = Hy. Also, because 6”, the regression estimate of 
a”, is the estimate of Var(X — bY) = Var(X +¢*Y), it follows that the estimated 
variance of the control variate estimator X + ¢*(Y — y,) is 6?/n. o 


Example 8f Suppose, as in Example 8b, that we wanted to use simulation 


` to estimate the reliability function 


r(Pis -+ -> Pa) = E[O(S,,---+S,)] 


where 


‘10 otherwise 


Since E[S,] = p; it follows that 


E by =D», 


Hence, we can use the number of working components, Y = > S,, as a control 
variate of the estimator X = 6(S,,...,S,,). Since EZ; S; and &(S,,...,S,) are 
both increasing functions of the S,, they are positively correlated, and thus the 
sign of c* is negative. o 


Example 8g Consider a queueing system in which customers arrive in 
accordance with a nonhomogeneous Poisson process with intensity function 
A(s), s > 0. Suppose that the service times are independent random variables 
having distribution G and are also independent of the arrival times. Suppose we 


` were interested in estimating the total time spent in the system by all customers 


arriving before time t. That is, if we let W, denote the amount of time that the 


ith entering customer spends in the system, then we are interested in 0 = E[X], 
where 


NO) 
x=W, 


i=l 


and where M(t) is the number of arrivals by time ¢. A natural quantity to use as 
a control in this situation is the total of the service times of all these customers. 
That is, let S; denote the service time of the ith customer and set 


Nt) 


Y= YS; 
i=l 
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Since the service times are independent of N{¢], it follows that 
E[Y] = EISIEN] 


where E[S], the mean service time, and E[N(t)], the mean number of arrivals by 
t, are both known quantities. Qo 


Example 8h Asin Example 8d, suppose we were interested in using simu- 
lation to compute 0 = E[e”]. Here, a natural variate to use as a control is the 
random number U. To see what sort of improvement over the raw estimator is 
possible, note that 


Cov(e", U) = E[Ue"]— E[U]E[e"] 
= Í ! xe dx — eD 
(e-1) 
2 


=1— = 0.14086 


Because Var(U) = $ it follows from (8.2) that 


1 
Var (e +c (v ~ 3)) = Var(e") — 12(0.14086)? 
= 0.2420 — 0.2380 = 0.0039 


where the above used, from Example 8d, that Var(e”) = 0.2420. Hence, in this 
case, the use of the control variate U can lead to a variance reduction of up to 
98.4 percent. o 


Example 8i A List Recording Problem Suppose we are given a set of 
n elements, numbered 1 through n, which are to be arranged in an ordered list. 
At each unit of time a request is made to retrieve one of these elements, with 
the request being for element i with probability p(i), >°7_, p(i) = 1. After being 
requested, the element is put back in the list but not necessarily in the same 
position. For example, a common reordering rule is to interchange the requested 
element with the one immediately preceding it. Thus, if n = 4 and the present 
ordering is 1, 4, 2, 3, then under this rule a request for element 2 would result 
in the reorder 1, 2, 4, 3. Starting with an initial ordering that is equally likely 
to be any of the n! orderings and using this interchange rule, suppose we are 
interested in determining the expected sum of the positions of the first N elements 
requested. How can we efficiently accomplish this by simulation? 

One effective way is as follows. The “natural” way of simulating the above 
is first to generate a random permutation of 1,2,...,m to establish the initial 
ordering, and then at each of the next N periods determine the element requested 
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by generating a random number U and then letting the request be for element 
jif J p(k) < U < $} p(k). However, a better technique is to generate the 
element requested in such a way that small values of U correspond to elements 
close to the front. Specifically, if the present ordering is i,,i,,...,i,, then 
generate the element requested by generating a random number U and then 
letting the selection be for i; if i-i p(i,) < U < £l p(i,). For example, if 


- n= 4 and the present ordering is 3, 1, 2, 4, then we should generate U and let 


the selection be for 3 if U < p(3), let it be for 1 if p(3) < U < p(3)+ p(1), and 
so on. As small values of U thus correspond to elements near the front, we can 
use J^, U, as a control variable, where U, is the random number used for the 
rth request in a run. That is, if P, is the position of the rth selected element in 
a run, then rather than just using the raw estimator Be P, we should use 


N N N 
> P,+c* | 0 U,- x) 
r=1 r=] 2 
where 


x Cov Ci Ps ae U,) 
a aaa 
12 


and where the above covariance should be estimated using the data from all the 
simulated runs. 

Although the variance reduction obtained will, of course, depend on the prob- 
abilities p(i),i=1,...,n, and the value of N, a small study indicates that when 
n= 50 and the p(i) are approximately equal, then for 15 < N < 50 the variance 
of the controlled estimator is less than z5 the variance of the raw simulation 
estimator. o 


Of course, one can use more than a single variable as a control. For example, 
if a simulation results in output variables Y,,i=1,...,k, and E[Y,] =p, is 
known, then for any constants c;,i=1,...,k, we may use 


k 
X+) e(¥;— mi) 


i=l 


as an unbiased estimator of E[X]. 


Example 8j Blackjack The game of blackjack is often played with the 
dealer shuffling multiple decks of cards, putting aside used cards, and finally 
reshuffling when the number of remaining cards is below some limit. Let us say 
that a new round begins each time the dealer reshuffles, and suppose we are 
interested in using simulation to estimate E[X], a player’s expected winnings per 
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round, where we assume that the player is employing some fixed strategy which 
might be of the type that “counts cards” that have already been played in the 
round and stakes different amounts depending on the “count.” We will assume 
that the game consists of a single player against the dealer. 

The randomness in this game results from the shuffling of the cards by the 
dealer. If the dealer uses k decks of 52 cards, then we can generate the shuffle by 
generating a random permutation of the numbers 1 through 52k; let J), ..., Zs, 
denote this permutation. If we now set 


u=; mod 13+1 
and let 
v; = min(u;, 10) 


then v,;, j = 1,...,52k represents the successive values of the shuffled cards, 
with 1 standing for an ace. 

Let N denote the number of hands played in a round, and let B; denote the 
amount bet on hand j. To reduce the variance, we can use a control variable that 
is large when the player is dealt more good hands than the dealer, and is small 
in the reverse case. Since being dealt 19 or better is good, let us define 


W, = 1 if the player’s two dealt cards on deal j add to at least 19 
and let W, be 0 otherwise. Similarly, let 
Z; = 1 if the dealer’s two dealt cards on deal j add to at least 19 


and let Z; be 0 otherwise. Since W, and Z; clearly have the same distribution it 
follows that E[W,; — Z,] = 0, and it is not difficult to show that 


E p B;(W;— z| =o 
j=l 


Thus, we recommend using pay B;(W;— Z;) as a control variable. Of course, 
it is not clear that 19 is the best value, and one should experiment on letting 18 
or even 20 be the critical value. However, some preliminary work indicates that 
19 works best, and it has resulted in variance reductions of 15 percent or more 
depending on the strategy employed by the player. An even greater variance 
reduction should result if we use two control variables. One control variable is 
defined as before, with the exception that the W, and Z j are defined to be 1 if 
the hand is either 19 or 20. The second variable is again similar, but this time 
its indicators are 1 when the hands consist of blackjacks. o 
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When multiple control variates are used, the computations can be performed 
by using a computer program for the multiple linear regression model 


k 
X=a+) bY,+e 


i=] 


where e is a random variable with mean 0 and variance g°. Letting éf be the 
estimate of the best c;, fori=1,...,k, then 


c} = —b;, GHA y24-39K 


where b; i=1,...,k, are the least squares regression estimates of b;, i = 
1,...,k. The value of the controlled estimate can be obtained from 


k k 
X+} ò,- =â+) biki 


i=1 i=l 


That is, the controlled estimate is just the estimated multiple regression line 
evaluated at the point (44, - . - , Hg). 

The variance of the controlled estimate can be obtained by dividing the regres- 
sion of o by the number of simulation runs. 


Remarks 


1. Since the variance of the controlled estimator is not known in advance, one 
often performs the simulation in two stages. In the first stage a small number 
of runs are performed so as to give a rough estimate of Var(X +c*(Y —p,))- 
(This estimate can be obtained from a simple linear regression program, 
where Y is the independent and X is the dependent variable, by using the 
estimate of o?.) We can then fix the number of trials needed in the second 
run so that the variance of the final estimator is within an acceptable bound. 

2. A valuable way of interpreting the control variable approach is that it 
combines estimators of 0. That is, suppose the values of X and W are both 
determined by the simulation, and suppose E[X] = E[W] = 8. Then we may 
consider any unbiased estimator of the form 


aX+(i-a)W 


The best such estimator, which is obtained by choosing œ to minimize the 
variance, is given by letting a = a*, where 
ae Var(W) —Cov(X, W) 
Var(X) + Var(W) — 2 Cov(X, W) 


(8.3) 
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Now if E[¥] =“, is known, we have the two unbiased estimators X and 
X+Y—yp,. The combined estimator can then be written as 


(l—c)X+c(X+Y—p,y)=X+c(¥ —py) 


To go the other way in the equivalence between control variates and combin- 
ing estimators, suppose that E[X] = E[W] = 0. Then if we use X, controlling 
with the variable Y = X — W, which is known to have mean 0, we then 
obtain an estimator of the form 


X+c(X —W) = (1l+c)X—cW 


which is a combined estimator with a = 1 +c. 

3. With the interpretation given in Remark 2, the antithetic variable approach 
may be regarded as a special case of control variables. That is, if E[X] = 
0, where X = A(U,,...,U,), then also E[W] = 0, where W = h(1 — 
U,,...,1—U,). Hence, we can combine to get an estimator of the form 
aX+(1—a)W. Since Var(X) = Var(W), as X and W have the same dis- 
tribution, it follows from Equation (8.3) that the best value of a is a=}, 
and this is the antithetic variable estimator. i 

4. Remark 3 indicates why it is not usually possible to effectively combine 
antithetic variables with a control variable. If a control variable Y has a 
large positive (negative) correlation with h(U,,..., U,) then it probably has 
a large negative (positive) correlation with h(1—U,,...,1—U,,). Conse- 


quently, it is unlikely to have a large correlation with the antithetic estimator 
AUTER U,)+h(l-U,,..., 1—U,,) 
2 $ m} 
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Recall the conditional variance formula proved in Section 2.10 of Chapter 2. 
Var(X) = E[Var(X|Y)]+ Var(E[X|Y¥]) 


Since both terms on the right are nonnegative, because a variance is always 
nonnegative, we see that 


Var(X) > Var(E[X|Y¥]) (8.4) 


Now suppose we are interested in performing a simulation study so as to ascertain 
the value of 6 = E[X], where X is an output variable of a simulation run. Also, 
suppose there is a second variable Y, such that E[X|Y¥] is known and takes on a 
value that can be determined from the simulation run. Since 


E[E[X|Y]] = E[X] = 6 
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it follows that E[X|Y] is also an unbiased estimator of @; thus, from (8.4) it 
follows that as an estimator of 6, E[X|Y] is superior to the (raw) estimator X. 


Remarks To understand why the conditional expectation estimator is supe- 
rior to the raw estimator, note first that we are performing the simulation to 
estimate the unknown value of E[X]. We can now imagine that a simulation 


- run proceeds in two stages: First, we observe the simulated value of the random 


variable Y and then the simulated value of X. However, if after observing Y 
we are now able to compute the (conditional) expected value of X, then by 
using this value we obtain an estimate of E[X], which eliminates the additional 
variance involved in simulating the actual value of X. — (m) 


At this point one might consider further improvements by using an estimator 
of the type aX + (1 — a) E[X|Y]. However, by Equation (8.3) the best estimator 
of this type has a = a*, where 


L Var(E[X|Y¥]) — Cov(X, E[X|¥]) 
© = Var(X) + Var(E[X|¥1) — 2 Cov(X, E[X|¥)) 


We now show that a* = 0, showing that combining the estimators X and E[X|Y| 
does not improve on just using E[X|Y]. 
First note that 


Var(E[X|¥]) = E[(E[X|¥1)"] — (EIEII? 
= E[(E[X|¥])"] — ERD? (8.5) 


On the other hand, 


Cov(X, E[X|¥]) = E[XE[X|Y]]— E[XIE(ELX I¥i] 

= E[XE[X|Y]] — (E[X])’ 

= E[E[XE[X|¥||¥I] — (EX)? 
(conditioning onY) 

= E{E[X|Y|E[X|¥]] — (EXD? 
(since given Y, E[X|Y] is a constant) 

= Var(E[X|Y]) [from (8.5)] 

Thus, we see that no additional variance reduction is possible by combining the 


estimators X and E[X|Y]. 
We now illustrate the use of “conditioning” by a series of examples. 
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Example 8k Let us reconsider our use of simulation to estimate 7. In 
Example 3a of Chapter 3, we showed how we can estimate a by determining 
how often a randomly chosen point in the square of area 4 centered around 
the origin falls within the inscribed circle of radius 1. Specifically, if we let 
V; = 2U; —1, where U;, i= 1,2, are random numbers, and set 


ea 1 if V?+V? <1 
0 otherwise 


then, as noted in Example 3a, E[J] = 7/4. 
The use of the average of successive values of J to estimate 7/4 can be 
improved upon by using E[J|V,] rather than J. Now 


E{I|V, =v] = P {V} +V} <1|V, =v} 
=P{v’+V; <1IV, =v} 
= P{V;<1-—v"} by the independence of V, and V, 
= P{—(1-v’)"? < V, <(1—v*)'7} 


a- y] 
= es (5) dx since V, is uniform over (—1, 1) 
=(1—v*)¥? 
Hence, 
E[I|V,] = (1—v2)'” 


and so the estimator (1 — V?)!/? also has mean 7/4 and has a smaller variance 


than J. Since 
1 
e[-w"]=[o-0" ()a 


1 
=i. (=x)? dx 
0 
= E[(1- Uy? 
we can simplify somewhat by using the estimator (1—U7?)'/?, where U is a 


random number. 
The improvement in variance obtained by using the estimator (1 — U)'/? over 


the estimator J is easily determined. 


Var[(1-U?)'?] = E- (7). 


ae (2) = 0.0498 
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where the first equality used the identity Var(W) = E[W?] — (E[W])’. On 
the other hand, because J is a Bernoulli random variable having mean 7/4, 
we have 


Var(I) = (>) (1- =) =~ 0.1686 


‘thus showing that conditioning results in a 70.44 percent reduction in variance. 


(In addition, only one rather than two random numbers is needed for each 
simulation run, although the computational cost of having to compute a square 
root must be paid.) 

Since the function (1 —u?)'/? is clearly a monotone decreasing function of u 
in the region 0 < u < 1, it follows that the estimator (1 — U)'/” can be improved 
upon by using antithetic variables. That is, the estimator 


la —U?)'?+(1-(1—U)’)"7] 


has smaller variance than 4{(1 — U7)? + (1 — U3)"/”]. 

Another way of improving the estimator (1 — U’)'” is by using a control 
variable. A natural control variable in this case is U? and, because E[U?] = L, 
we could use an estimator of the type 


-Uu +c (e - 5) 


The best c—namely, c* = —Cov[(1 — U?)!?, U*)/Var(U”)—«an be estimated 
by using the simulation to estimate the covariance term. (We could also have 
tried to use U as a control variable; it makes a difference because a correlation 
between two random variables is only a measure of their “linear dependence” 


- gather than of their total dependence. But the use of U? leads to a greater 


improvement; see Exercise 15.) o 


Example 8l Suppose that Y is an exponential random variable with mean 1, 
and suppose that, conditional on Y = y, X is a normal random variable with 
mean y and variance 4. How can we use simulation to efficiently estimate 
6=P{X> 1}? 

The raw simulation approach would start by generating the value of Y, say 
by generating a random number U and setting Y = —log(U); if Y =y, it would 
then generate the value of a random variable X that is normal with mean y and 
variance 4, and set 
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The average value of J over many runs would be the raw simulation estimator. 
The preceding estimator can, however, be improved by noting that if Y = y, 
then : 


is a standard normal random variable. Consequently, 
l-y —fi-y 


where ®(x) = 1 — ®(x) is the probability that a standard normal random variable 
exceeds x. Therefore, the average value of fay (4°) obtained over many runs is 
superior to the raw simulation estimator. as 

Because the conditional expectation estimator ®(45*) is monotone in Y, it 
can be further improved by using antithetic variables. That is, a random number 
U can be used to give the estimator 


(ee) o(a 2) 


2 


Another possibility, aside from antithetic variables, is to use Y as a control 
variable. Whether it is more effective to use antithetic variables or to use Y as a 
control variable can only be determined by a simulation study. o 


In our next example we use the conditional expectation approach to efficiently 
estimate the probability that a compound random variable exceeds some fixed 
value. 


Example 8m Let X,, X,,... be a sequence of independent and identically 
distributed positive random variables that are independent of the nonnegative 
integer valued random variable N. The random variable 


is said to be a compound random variable. In an insurance application, X; could 
represent the amount of the ith claim made to an insurance company, and N 
could represent the number of claims made by some specified time t; § would be 
the total claim amount made by time t. In such applications, N is often assumed 
to be either a Poisson random variable (in which case S is called a compound 
Poisson random variable) or a mixed Poisson random variable, where we say 


sion 
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that N is a mixed Poisson random variable if there is another random variable 
A, such that the conditional distribution of N, given that A = A, is Poisson with 
mean A. For instance, if A has a probability density function g(A), then the 
probability mass function of the mixed Poisson random variable N is 


—A\n 
Sas 
n 


PIN =n} = f 


Mixed Poisson random variables arise when there is a randomly determined 
“environmental state” that determines the mean of the (Poisson) number of events 
that occur in the time period of interest. The distribution function of A is called 
the mixing distribution. 

Suppose that we want to use simulation to estimate 


N 
p=P [5x > | 
i=] 


for some specified positive constant c. The raw simulation approach would first 
generate the value of N, say N =n, then generate the values of X,,...,X, and 
use them to determine the value of the raw simulation estimator 


rath if ke 
0, otherwise 


The average value of J over many such runs would then be the estimator of p. 
We can improve upon the preceding by a conditional expectation approach that 

starts by generating the values of the X; in sequence, stopping when the sum of 

the generated values exceeds c. Let M denote the number that is needed; that is, 


n 
M = min ($x > e) 
i=l 


If the generated value of M is m, then we use P{N > m} as the estimate of p 
from this run. To see that this results in an estimator having a smaller variance 
than does the raw simulation estimator 7, note that because the X; are positive 


I=1 4s N> M 
Hence, 
E[I|M] = P{N > M|M} 
Now, 


P{N > M|M = m} = P{N > m|M = m} = P{N > m} 
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where the final equality used the independence of N and M. Consequently, if 
the value of M obtained from the simulation is M = m, then the value E[J|M] 
obtained is P{N > m}. . 

The preceding conditional expectation estimator can be further improved by 
using a control variable. Let u = E[X;], and define 


M 
Y= LE =p) 
i=l 


It can be shown that E[ Y] =0. To intuitively see why Y and the conditional expec- 
tation estimator P{N > M|M} are strongly correlated, note first that the condi- 
tional expectation estimator will be small when M is large. But, because M is the 
number of the X; that needs to be summed to exceed c, it follows that M will be 
large when the X; are small, which would make Y small. That is, both E[J|M] and 
Y tend to be small at the same time. A similar argument shows that if E[J|M] is 
large then Y also tends to be large. Thus, it is clear that E[J|M] and Y are strongly 
positively correlated, indicating that Y should be an effective control variable. O 


Example 8n A Finite Capacity Queueing Model Consider a queue- 
ing system in which arrivals enter only if there are fewer than N other customers 
in the system when they arrive. Any customer encountering N others upon arrival 
is deemed to be lost to the system. Suppose further that potential customers 
arrive in accordance with a Poisson process having rate A; and suppose we are 
interested in using simulation to estimate the expected number of lost customers 
by some fixed time f. 

A simulation run would consist of simulating the above system up to time f. 
If, for a given run, we let L denote the number of lost customers, then the 
average value of L, over all simulation runs, is the (raw) simulation estimator 
of the desired quantity E[L]. However, we can improve upon this estimator by 
conditioning upon the amount of time that the system is at capacity. That is, 
rather than using L, the actual number of lost customers up to time t, we consider 
E[L|T-], where To is the total amount of time in the interval (0, ż) that there are 
N customers in the system. Since customers are always arriving at the Poisson 
rate A no matter what is happening within the system, it follows that 


E[L|Tc] = ATe 


Hence an improved estimator is obtained by ascertaining, for each run, the total 
time in which there are N customers in the system—say, Tç; is the time at 
capacity during the ith run. Then the improved estimator of E[L] is A £}, Te;/k, 
where k is the number of simulation runs. (In effect, since the expected number of 
lost customers given the time at capacity Tọ is just AT;, what this estimator does 
is use the actual conditional expectation rather than simulating—and increasing 
the variance of the estimator—a Poisson random variable having this mean.) 
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If the arrival process were a nonhomogeneous Poisson process having intensity 
function A(s), 0 < s < t, then we would not be able to compute the conditional 
expected number of lost customers if we were given only the total time at capac- 
ity. What we now need is the actual times at which the system was at capacity. 
So let us condition on the intervals during which the system was at capacity. 
Now letting Nç denote the number of intervals during (0, £) during which the 


‘system is at capacity, and letting those intervals be designated by 1,,...,Iy,» 


then 
Ne 
E[L|No hs- +s Iu] => Í A(s) ds 
i=l 4 


The use of the average value, over all simulation runs, of the above quantity leads 
to a better estimator—in the sense of having a smaller mean square error—of 
E[L] than the raw simulation estimator of the average number lost per run. 

One can combine the preceding with other variance reduction techniques in 
estimating E[L]. For instance, if we let M denote the number of customers that 
actually enter service by time t, then with N(t) equal to the number of arrivals 
by time ż we have that 


NO)=M+L 
Taking expectations gives that 
t 
f A(s) ds = E[M]+E{L] 
0 


Therefore, h (s) ds — M is also an unbiased estimator of E[L], which suggests 


_ the use of the combined estimator 


ay: [x ds+(1—a@) ([revas—m) 


The value of æ to be used is given by Equation (8.3) and can be estimated from 
the simulation. oO 


Example 80 Suppose we wanted to estimate the expected sum of the times 


in the system of the first n customers in a queueing system. That is, if W; is the 
time that the ith customer spends in the system, we are interested in estimating 


(Es 
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Let S, denote the “state of the system” at the moment that the ith customer 
arrives, and consider the estimator 


YEWI] 


i=l 


Since 
E È swisi = ALES] =o EIW] = 9 
i=l i=l i=] 


it follows that this is an unbiased estimator of 8. It can be shown! that, in a wide 
class of models, this estimator has a smaller variance than the raw simulation 
estimator }°7_, W;. (It should be noted that whereas it is immediate that E[W,|S;] 
has smaller variance than W,, this does not imply, because of the covariance 
terms, that J`; E[W,|S,] has smaller variance than $; W;.) 

The quantity 5;, which refers to the state of the system as seen by the ith 
customer upon its arrival, is supposed to represent the least amount of information 
that enables us to compute the conditional expected time that the customer spends 
in the system. For example, if there is a single server and the service times are 
all exponential with mean y, then S; would refer to N;, the number of customers 
in the system encountered by the ith arrival. In this case, 


E[W|S,] = EWIN] = (N:+ 1) 


which follows because the ith arrival will have to wait for N, service times (one 
of which is the completion of service of the customer presently being served 
when customer i arrives—but, by the memoryless property of the exponential, 
that remaining time will also be exponential with mean y) all having mean y, 
and then to this we must add its own service time. Thus, the estimator that takes 
the average value, over all simulation runs, of the quantity } `; (N;+1)x is a 
better estimator than the average value of X; W; o 


Our next example refers to the distribution of the number of nontarget cells 
that are not accidentally killed before a set of target cells have been destroyed. 


Example 8p Consider a set of n-+m cells, with cell i having weight 
w;,iz= 1,...,n-+m. Imagine that cells 1,...,2 are cancerous and that cells 
n+1,...,n-++m are normal, and suppose cells are killed one at a time in the 
following fashion. If, at any time, S is the current set of cells that are still alive 


1 S, M. Ross, “Simulating Average Delay—Variance Reduction by Conditioning,” Probability Eng. 
Informational Sci. 2(3), 1988. 
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then, independent of the order in which the cells that are not in S have been 


killed, the next cell to be killed is cell i, i € S, with probability Tes “i. Therefore, 
with probability Sea, cell i is the first cell killed; given that cell i is i first 
cell killed, the next cell killed will be cell k, k Æ i, with probability Fre , and 


so on. This process of killing cells continues until all of the first n telis (the 


„cancer cells) have been killed. Let N denote the number of the normal cells that 


are still alive at the time when all the cancer cells have been killed. We are 
interested in determining P{N > k}. 
Before attempting to develop an efficient simulation procedure, let us consider 


a related model in which cell i,i=1,...,2-++m, is killed at the random time T,, 
where T,,..., Taym are independent exponential random variables with respec- 
tive rates W,,..., Wzi,- By the lack of memory property of exponential random 


variables, it follows that if S is the set of cells that are currently alive then, as 
in tig original model, cell į, i € S, will be the next cell killed with probability 
Fe any , showing that the order in which cells are killed in this related model has 
the same probability distribution as it does in the original model. Let N represent 
the number of cells that are still alive when all cells 1,...,7 as been killed. 
Now, if we let T® be the k" largest of the values T,,,,,--- Trim then T® is 
the first time at which there are fewer than k normal cells alive. Thus, in order 
for N to be at least k, all the cancer cells must have been killed by time TH, 
That is, 


P{N > k} = P {Max T, < T} 


Therefore, 


P{N>k|T®} =P {Max T,<T®|T®} 
=[[a-e"") 
i=l 


where the final equality used the independence of the 7;. Hence, we obtain an 
unbiased, conditional expectation estimator of P{N > k} by generating the m 


exponential random variables T,,,,..- , Trm- Then letting 7 be the k" largest 
of these values gives the estimator []j_,(1— = ’), Because this estimator is an 
increasing function of the generated T,,,,..., Taym further variance reduction 


is possible provided the T; are obtained from the inverse transform method. For 
then the estimator will be an increasing function of the m random numbers used, 
indicating that antithetic variables will lead to further variance reduction. Putting 
it all together, the following gives a single run of the algorithm for estimating 
P{N >k}. 
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STEP 1: Generate random numbers U,,..., Um- 

STEP 2: Let T® be the k'* largest of the m values = log(U, ),i=1,. 

STEP 3: Let S® be the k" largest of the m values ——- +- log(1 — an i= 
1,...,m. 


STEP 4: The estimator from this run is 


sia-sia] 
i=] 


i=1 


Estimating the Expected Number of Renewals by Time ¢ 


Suppose that “events” are occurring randomly in time. Let T, denote the time of 
the first event, T, the time between the first and second event, and, in general, 
T, the time between the (n — 1)st and the mth event, n > 1. If we let 


$= 37, 
i=] 


the first event occurs at time S,, the second at time S,, and, in general, the nth 
event occurs at time S, (see Figure 8.2). Let N(#) denote the number of events 
that occur by time ¢; that is, N(#) is the largest n for which the nth event occurs 
by time ż, or, equivalently, 


N(t) = Max{n: S, < t} 


If the interevent times T,, T}... are independent and identically distributed 
according to some distribution function F', then the process {N(t), t > 0} is called 
a renewal process. 

A renewal process is easily simulated by generating the interarrival times. 
Suppose now that we wanted to use simulation to estimate 6 = E[N(2)], the 
mean number of events by some fixed time t. To do so we would successively 
simulate the interevent times, keeping track of their sum (which represent the 
times at which events occur) until that sum exceeds t. That is, we keep on 
generating interevent times until we reach the first event time after t. Letting 
N(t)—the raw simulation estimator—denote the number of simulated events by 


SS ee ae a E 
| x 


0 Sy S2 53 


Figure 8.2. x = event. 


8.3 Variance Reduction by Conditioning 165 


time t, we find that a natural quantity to use as a control variable is the sequence 
of N(t)+1 interevent times that were generated. That is, if we let u denote 
the mean interevent time, then as the random variables T; — u have mean 0 it 


follows that 
N(t)+1 
E 2 Tm) = 
i=] 


Hence, we can control by using an estimator of the type 


N(t)+1 N(t)-+1 
moe] X a-w |=acore| E n-aon] 


= Mt) + c[S no — R(t) — u] 


Now since S„ represents the time of the nth event and N(t)+ 1 represents the 
number of events by time ż plus 1, it follows that Sy),, represents the time of 
the first event after time t. Hence, if we let Y(t) denote the time from ż until the 
next event [Y(f) is commonly called the excess life at t], then 


Suir = t+ YO) 
and so the above controlled estimator can be written 
N(t) + [t+ YG) — w(t) — u] 
The best c is given by 


Cov[N(1), YC) — HN] 
Var[¥(t) — uN] 
Now for ¢ large, it can be shown that the terms involving N(t) dominate— 


because their variance will grow linearly with t, whereas the other terms will 
remain bounded—and so for t large 


E oo 


ot a —COVIN(), eNO] _ eVa] _ 1 


ar[—wN(2)] VaN] p 


Thus, for ¢ large, the best controlled estimator of the above type is close to 


HG) 
LB 


N(t)+— 0+0- eN- u) = -1 (8.7) 


In other words, for ¢ large, the critical value to be determined from the simulation 
is Y(r), the time from ¢ until the next renewal. 
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A(t) 


rogaa ee ŘŘĖō 
i t 


Last renewal 
before t 


Figure 8.3. Age att. 


The above estimator can further be improved upon by the use of “condition- 
ing.” Namely, rather than using the actual observed time of the first event after 
t, we can condition on A(t), the time at r since the last event (see Figure 8.3). 
The quantity A(t) is often called the age of the renewal process at t. [If we 
imagine a system consisting of a single item that functions for a random time 
having distribution F and then fails and is immediately replaced by a new item, 
then we have a renewal process with each event corresponding to the failure of 
an item. The variable A(t) would then refer to the age of the item in use at time 
t, where by age we mean the amount of time it has already been in use.] 

Now if the age of the process at time t is x, the expected remaining life of the 
item is just the expected amount by which an interevent time exceeds x given 
that it is greater than x. That is, 


ELYO|AQ = x] = E[T —x|T > x] 


j fO) dy 
z. 0-7 FG) 
= ulz] 


where the above supposes that F is a continuous distribution with density function 
Jf. Hence, with u[x] defined as above to equal E[T — x|T > x], we see that 


ELYO|A()] = p{AQ)] 
Thus, for large ft, a better estimator of E[N(r)] than the one given in 
Equation (8.7) is 
A(t t 
PEON 


1 (8.8) 
H 


8.4 Stratified Sampling 


Suppose we want to estimate 0 = E|X], and suppose there is some discrete 
random variable Y, with possible values y,,...,y,, such that 


(a) the probabilities p; = P{Y = y,},i=1,...,k, are known; and 
(b) foreach i=1,...,k, we can simulate the value of X conditional on Y = y,. 
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Now if we are planning to estimate E[X] by n simulation runs, then the usual 
approach would be to generate n independent replications of the random variable 
X and then use X, their average, as the estimate of E[X]. The variance of this 
estimator is 


Var(X) = “Var(X) 


However, writing 


k 
E[X] = Z E[X|¥ = y,]p; 


i=l 


we see that another way of estimating E[X] is by estimating the k quantities 
E[X|Y = y,],i=1,...,. For instance, suppose rather than generating n inde- 
pendent replications of X, we do np; of the simulations conditional on the event 
that Y = y; foreachi=1,...,k. If we let X, be the average of the np, observed 
values of X generated conditional on Y = y,, then we would have the unbiased 
estimator 


The estimator £ is called a stratified sampling estimator of E[X]. 

Because X; is the average of np; independent random variables whose dis- 
tribution is the same as the conditional distribution of X given that Y = y,, it 
follows that 
Var(X|¥ = y;) 

np; 


Var(X;) = 


- Consequently, using the preceding and that the X,,i=1,...,k, are independent, 


we see that 


Var(é) = 5° piVar(%,) 


i=] 


1 k 
= z 2 PiVar(X|¥ = y;) 


i=1 


= = £[Var(X1Y)] 


Because Var(X) = 1Var(X), whereas Var(€) = 1E[Var(X|¥)], we see from the 
conditional variance formula 


Var(X) = E[Var(X|¥)]+ Var(E[X|¥]) 
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that the variance savings in using the stratified sampling estimator € over the Consequently, 
usual raw simulation estimator is 
i Var(X) ~ 8+ 16 = 24 
Var(X) — Var(£) = —Var(E[X|Y]) 

n which is about 3 times as large as E[Var(X]Y)], the variance of the stratified 
sampling estimator that simulates exactly half the days as good days and the 


That is, the variance savings per run is Var(E[X|Y]) which can be substantial -other half as ordinary days. a 


when the value of Y strongly affects the conditional expectation of X. 


Again suppose that the probability mass function p; = P{Y=y,},i=1,...,k 
is known, that we can simulate X conditional on Y =i, and that we plan to 
do n simulation runs. Although performing np; of the n runs conditional on 
Y=y;,,i=1,...,k, is better than generating n independent replications of X, 
these are not necessarily the optimal numbers of conditional runs to perform. 
Suppose we plan to do n; runs conditional on Y = y,, where n = Dii n;. Then, 
with X; equal to the average of the n; runs conditional on Y = y,, the stratified 
sampling estimator is 


Remark The variance of the stratified sampling estimator can be estimated 
by eee S? be the sample variance of the np; runs done conditional on Y = 
Yp i= 1, k. Then s? is an unbiased estimator of Var(X|Y = y;), yielding 


Example 8q On good days customers arrive at an infinite server queue 
according to a Poission process with rate 12 per hour, whereas on other days they 
arrive according to a Poisson process with rate 4 per hour. The service times, 
on all days, are exponentially distributed with rate 1 per hour. Every day at time 
10 hours the system is shut down and all those presently in service are forced 
to leave without completing service. Suppose that each day is, independently, a 
good day with probability 0.5 and that we want to use simulation to estimate 0, 
the mean number of customers per day that do not have their services completed. 

Let X denote the number of customers whose service is not completed on a 
randomly selected day; let Y equal O if the day is ordinary, and let it equal 1 
if the day is good. Then it can be shown that the conditional distributions of X 
given that Y = 0 and that Y = 1 are, respectively, both Poisson with respective 
means 


with its variance given by 


k 
Var(6) = > p? Var(X|Y = i)/n; 


i=] 


Whereas the quantities Var(X|Y = i),i=1,...,k, will be initially unknown, 
we could perform a small simulation study to estimate them—say we use the 

estimators s?. We could then choose the n; by solving the following optimization 
problem: 


E[X|Y¥ =0]=4(1—e7"), E[X|Y =1] =12(1-—-e""") 


Because the variance of a Poisson random variable is equal to its mean, the 


preceding shows that choose 71j,..., ng 


such that X% n =n 


Var(X|¥ = 0) = E[X|Y =0] ~ 4 to minimize $ p}s?/n, 


Var(X|¥ = 1) = E[X|¥ = 1] ¥ 12 Using Lagrange multipliers, it is easy to show that the optimal values of the n; 
in the preceding optimization problem are 


Thus, 
1 n=n i, E EN 
E[Var(X|¥)] ~ 74+12)= Zj- Ps; 
and Once the n; are determined and the simulations performed, we would estimate 


E[X] 7 Ei PiX; and we would estimate the variance of this estimator by 
a p;S?/n;, where S? is the sample variance of the n; runs done conditional 


_ 4 +(12)? 
O on Y = y,i=l,...,k 


Var(E[X|¥]) = E[(E[X|¥])"] — (EX)? ~ 


{ 
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For another illustration of stratified sampling, suppose that we want to use n 


simulation runs to estimate 
1 
0 = E[h(U)] = f h(x) dx 
0 


If we let 


then 


o= DEA) = <1 


j=l 


= EAU) 


j=l 


where Up is uniform on ((j—1)/n, j/n). Hence, by the preceding, it follows 
that rather than generating U,,..., U, and then using }°"_, h(U;)/n to estimate 


0, a better estimator is obtained by using 


Example 8r In Example 8k we showed that 
T 
T 


Hence, we can estimate a by generating U,,..., U, and using the estimator 


est = t5 1—[(0,+j-1)/nP 


j=l 


In fact, we can improve the preceding by making use of antithetic variables to 


obtain the estimator 


a= 29 (1-14 D m+ 1-0-0} 
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A simulation using the estimator 7 yielded the following results: 


A 


n T 
5 3.161211 
10 3.148751 
100 3.141734 
500 3.141615 
1000 3.141601 
5000 3.141593 
When n = 5000, the estimator 7 is correct to six decimal places. a 


Remarks 


1. Suppose we want to use simulation to estimate E[X] by stratifying on the 


values of Y, a continuous random variable having distribution function G. 
To perform the stratification, we would first generate the value of Y and 
then simulate X conditional on this value of Y. Say we use the inverse 
transform method to generate Y; that is, we obtain Y by generating a random 
number U and setting Y = G7! (U). If we plan to do n simulation runs, then 
rather than using n independent random numbers to generate the successive 
values of Y, one could stratify by letting the i random number be the 
value of a random variable that is uniform in the region (©, +). In this 
manner we obtain the value of Y in run i—call it Y—by generating a 
random number U, and setting Y, = G7! (F841). We would then obtain 
X;, the value of X in run i, by simulating X conditional on Y equal to the 
observed value of Y;. The random variable X; would then be an unbiased 
estimator of E[X|G (+=) < Y < G1(4)], yielding that + 5i, X; is an 
unbiased estimator of 


Eee (2) erse (i) 
“E(t 
“ef oot] eons 


= E[X] 


where the penultimate equation used that G(Y) is uniform on (0, 1). 


2. Suppose that we have simulated n independent replications of X without 


doing any stratification, and that in n; of the simulation runs, the resulting 
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value of Y was y;, X% n; =n. If we let X, denote the average of the n; 
runs in which Y = y,, then X, the average value of X over all n runs, can 
be written as i 


ll 
ak 


Pdi 
Il 
ale 
‘Ms 
sas 


zie 


l 
Mes 


Mr 


T 


i 
= (5 
El 


When written this way, it is clear that using X to estimate E[X] is equiv- 
alent to estimating E[X|Y = i] by X; and estimating p; by n,/n for each 
i = 1,...,k. But since the p; are known, and so need not be estimated, it 
would seem that a better estimator of E[X] than X would be the estimator 
Di p;,X;. In other words, we should act as if we had decided in advance to 
do stratified sampling, with 7; of our simulation runs to be done conditional 
on Y=y,,i=1,...,k. This method of stratifying after the fact is called 
poststratification. oO 


Suppose again that we are interested in estimating 6 = E[X], where X is 


dependent on the random variable S, which takes on one of the values 1,2,...k 
with respective probabilities p,,i=1,...,k. Then 


E[X] = p E[X|S = 1]+ p,E[X|S = 2]4+---+p,E[X|S = k] 


If all of the quantities E[X|S = i] are known (that is, if E[X|S] is known), but 
the p; are not, then we can estimate 0 by generating the value of S and then 
using the conditional expectation estimator E[X|S]. On the other hand, if it is 
the p; that are known and we can generate from the conditional distribution 
of X given the value of S, then we can use simulation to obtain estimators 
E[X|S = i] of the quantities E[X|S = i] and then use the stratified sampling 
estimator )“*_, p;E[X|S = i] to estimate E[X]. When some of the p; and some 
of the E[X|S = i] are known, we can use a combination of these approaches. 


Example 8s Inthe game of video poker a player inserts one dollar into a 
machine, which then deals the player a random hand of five cards. The player is 
then allowed to discard certain of these cards, with the discarded cards replaced 
by new ones from the remaining 47 cards. The player is then returned a certain 
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amount depending on the makeup of her or his final cards. The following is a 
typical payoff scheme: 


Hand Payoff 
Royal flush 800 
Straight flush 50 
Four of a kind 25 
Full house 8 
Flush 5 
Straight 4 
Three of a kind 3 
Two pair 2 
High pair (jacks or better) 1 
Anything else 0 


In the preceding, a hand is characterized as being in a certain category if it is of 
that type and not of any higher type. That is, for instance, by a flush we mean 
five cards of the same suit that are not consecutive. 

Consider a strategy that never takes any additional cards (that is, the player 
stands pat) if the original cards constitute a straight or higher, and that always 
retains whatever pairs or triplets it is dealt. For a given strategy of this type let X 
denote the player’s winnings on a single hand, and suppose we are interested in 
estimating 0 = E[X]. Rather than just using X as the estimator, let us start by con- 
ditioning on the type of hand that is initially dealt to the player. Let R represent 
a royal flush, S represent a straight flush, 4 represent four of a kind, 3 represent 
three of a kind, 2 represent two pair, 1 represent a high pair, 0 represent a low 
pair, and “other” represent all other hands not mentioned. We then have 


E[X] =E[X|R]P{R} + E[X|S]P{S} + E[X|4]P{4} + E[X [full] P{full} 
+ E[X|flush]P{flush} + E[X|straight]P{straight} + E[X|3]P{3} 
+ E[X|2]P{2} + E[X|1]P{1} + E[X|0]P{0} + E[X|other]P{other} 


Now, with C = Cy. we have 


P{R} =4C = 1.539 x 1076 
P{S} =4-9-C = 1.3851 x 1077 
P{4} = 13 -48 - C = 2.40096 x 107+ 


4\ (4 
P{full} = 13- 12(3) (3) C = 1.440576 x 1073 


13 
P{flush} = 4 (( 2 ) — 10) C = 1.965402 x 1073 


i 
l 
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P{straight} = 10(4° — 4)C = 3.924647 x 107? 


P{3} = 3() )#e = 2.1128451 x 107 


P{2} = G) (2) C = 4.7539016 x 107° 


P{1} = afa) (ec = 0.130021239 


2 
P{0} = o(3) (5) 4C = 0.292547788 


P{other} = 1 — P{R} — P{S} — P{full} — P{flush} 
— P{straight} — > P{i} = 0.5010527 


Therefore, we see that 


3 
E[X] = 0.0512903 + Y` E[X|i] P{i} + E[X |other]0.5010527 


- i=0 


Now, E[X|3] can be analytically computed by noting that the 2 new cards will 


come from a subdeck of 47 cards that contains 1 card of one denomination — 


(namely the denomination to which your three of a kind belong), 3 cards of two 
denominations, and 4 cards of the other 10 denominations. Thus, letting F be 
the final hand, we have that 


46 

P{F = 4ldealt 3} = Gay = 0:082553191 
2 
2-3410-6 
P{F = full|dealt 3} = ar sa = 0.061054579 
2 
P{F = 3|dealt 3} = 1 —0.042553191 —0.061054579 = 0.89639223 
Hence, 


E[X|3] = 25(0.042553191) + 8(0.061054579) + 3(0.89639223) 
= 4.241443097 


Similarly, we can analytically derive (and the derivation is left as an exercise) 
E[X|i] for i= 0, 1, 2. 

In running the simulation, we should thus generate a hand. If it contains at 
least one pair or a higher hand then it should be discarded and the process 
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begun again. When we are dealt a hand that does not contain a pair (or any 
higher hand), we should use whatever strategy we are employing to discard and 
receive new cards. If X, is the payoff on this hand, then X, is the estimator of 
E{X|other], and the estimator of @ = E[X] based on this single run is 


Ô =0.0512903 + 0.021128451 (4.241443097) +-0.047539016E[X|2] 
+ 0.130021239E[X|1] + 0.292547788E[X 0] +0.5010527X, 


Note that the variance of the estimator is 


Var(6) = (0.5010527)?Var(X,) o 


Remarks 


1. We have supposed that the strategy employed always sticks with a pat hand 
and always keeps whatever pairs it has. However, for the payoffs given this 
is not an optimal strategy. For instance, if one is dealt 2, 10, jack, queen, 
king, all of spades, then rather than standing with this flush it is better to 
discard the 2 and draw another card (why is tbat?). Also, if dealt 10, jack, 
queen, king, all of spades, along with the 10 of hearts, it is better to discard 
the 10 of hearts and draw 1 card than it is to keep the pair of 10s. 

2. We could have made further use of stratified sampling by breaking up the 
“other” category into, say, those “other” hands that contain four cards of the 
same suit, and those that do not. It is not difficult to analytically compute 
the probability that a hand will be without a pair and with four cards of the 
same suit. We could then use simulation to estimate the conditional expected 
payoffs in these two “other” cases. o 


8.5 Applications of Stratified Sampling 


In the following subsections, we show how to use ideas of stratified sampling 
when analyzing systems having Poisson arrivals, monotone functions of many 
variables, and compound random vectors. 

In 8.5.1 we consider a model in which arrivals occur according to a Poisson 
process, and then we present an efficient way to estimate the expected value 
of a random variable whose mean depends on the arrival process only through 
arrivals up to some specified time. In 8.5.2 we show how to use stratified 
sampling to efficiently estimate the expected value of a nondecreasing function 
of random numbers. In 8.5.3 we define the concept of a compound random 
vector and show how to efficiently estimate the expectation of a function of this 
vector. 
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Analyzing Systems Having Poisson Arrivals 


Consider a system in which arrivals occur according to a Poisson process and 
suppose we are interested in using simulation to compute E[D], where the value 
of D depends on the arrival process only through those arrivals before time t. 
For instance, D might be the sum of the delays of all arrivals by time ¢ in 
a parallel multiserver queueing system. We suggest the following approach to 
using simulation to estimate E[D]. First, with N(t) equal to the number of arrivals 
by time t, note that for any specified integral value m 


ELD] = FELDING) = fle“ (A1)! /j1+ ELDING) > m] 


j=0 


x ( — 2 e™ (ADİ / i) (8.9) 


j=0 


Let us suppose that E[D|N(t) = 0] can be easily computed and also that D can 
be determined by knowing the arrival times along with the service time of each 
arrival. 

Each run of our suggested simulation procedure will generate an independent 
estimate of E[D]. Moreover, each run will consist of m+ 1 stages, with stage j 
producing an unbiased estimator of E[D|N(t) = j], for j=1,...,m, and with 
stage m+ 1 producing an unbiased estimator of E[D|N(t) > m]. Each succeeding 
stage will make use of data from the previous stage along with any additionally 
needed data, which in stages 2, . . . , m will be another arrival time and another 
service time. To keep track of the current arrival times, each stage will have a 
set S whose elements are arranged in increasing value and which represents the 
set of arrival times. To go from one stage to the next, we make use of the fact 
that conditional on there being a total of j arrivals by time 1, the set of j arrival 
times are distributed as j independent uniform (0, £) random variables. Thus, the 
set of arrival times conditional on j +1 events by time f is distributed as the set 
of arrival times conditional on j events by time ż along with a new independent 
uniform (0, £) random variable. 

A run is as follows: 


STEP 1: Let N = 1. Generate a random number U,, and let S = {tU,}. 

STEP 2: Suppose N(t) = 1, with the arrival occurring at time tU,. Generate 
the service time of this arrival, and compute the resulting value 
of D. Call this value D}. 

STEP 3: Let N=N-+1. 

STEP 4: Generate a random number Uy, and add tUy in its appropriate 
place to the set S so that the elements in S are in increasing order. 
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STEP 5: Suppose M(t) = N, with S specifying the N arrival times; generate 
the service time of the arrival at time +U,, and, using the previously 
generated service times of the other arrivals, compute the resulting 
value of D. Call this value Dy. 

STEP 6: If N < m return to Step 3. If N = m, use the inverse transform 
method to generate the value of M(t) conditional on it exceed- 
ing m. If the generated value is m+ k, generate k additional 
random numbers, multiply each by t, and add these k numbers 
to the set S. Generate the service times of these k arrivals and, 


using the previously generated service times, compute D. Call this 
value D m- 


With Do = E[D|N(1) = 0], the estimate from this run is 


Ez SD eA Jj! + Dym ( = 3 ean / i) (8.10) 


j=0 j=0 


Because the set of unordered arrival times, given that N(t) = j, is distributed as 
a set of j independent uniform (0, t) random variables, it follows that 


ED) =EIDIN®) =j] — E[D._] = EIDIN > m] 


thus showing that £ is an unbiased estimator of E[D]. Generating multiple runs 
and taking the average value of the resulting estimates yields the final simulation 
estimator. 


‘Remarks 


1. It should be noted that the variance of our estimator )°""5 Dye (At) /j!+ 
Deer Xio e*'(At)//j!) is, because of the positive correlations intro- 
duced by reusing the same data, larger than it would be if the D; were 
independent estimators. However, the increased speed of the simulation 
should more than make up for this increased variance. 

2. When computing D,,,, we can make use of quantities used in computing 
D,. For instance, suppose D; ; was the delay of arrival i when N(t) = j. If the 
new arrival time tU;,, is the k* smallest of the new set S, then D; ja = Dij 
for i < k. ' 

3. Other variance reduction ideas can be used in conjunction with our approach. 
For instance, we can improve the estimator by using a linear combination 
of the service times as a control variable. o 
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It remains to determine an appropriate value of m. A reasonable approach 
might be to choose m to make 


E[D|N(t) > m]P{N() > m} = E[D|N(t) > m] ( ~ Seann) 
j=0 


sufficiently small. Because Var(N(t)) = At, a reasonable choice would be of the 
form 


m=At+kVAt 


for some positive number k. 

To determine the appropriate value of k, we can try to bound E[D|N(t) > m] 
and then use this bound to determine the appropriate value of k (and m). For 
instance, suppose D is the sum of the delays of all arrivals by time ¢ in a single 
server system with mean service time 1. Then because this quantity will be 
maximized when all arrivals come simultaneously, we see that 


N(—1 


EIDINOI< X i 


i=] 


Because the conditional distribution of N(1) given that it exceeds m will, when 
m is at least 5 standard deviations greater than E[N(t)], put most of its weight 
near m+ 1, we see from the preceding that one can reasonably assume that, for 
k>5, 


EDINO > m] < (m+1)?/2 


Using that, for a standard normal random variable Z (see Sec. 3.3 of Ross, 
Probability Models for Computer Science, Academic Press, 2002) 


e* /2 
PEDERE ey x>O0 
xv 2T 


we see, upon using the normal approximation to the Poisson, that for k > 5 and 
m = àÀt+ kv àt, we can reasonably assume that 


eR? 


2k 20 


For instance, with At = 10° and k = 6, the preceding upper bound is about .0008. 
We will end this subsection by proving that the estimator € has a smaller 
variance than does the raw simulation estimator D. 


E[D|N(t) > m|P{N(#) > m} < (m+ 1? 
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Theorem 
Var(€) < Var(D) 


Proof We will prove the result by showing that € can be expressed as a 


conditional expectation of D given some random vector. To show this, we will 


utilize the following approach for simulating D: 


STEP Í: Generate the value of N’, a random variable whose distribution 
is the same as that of N(z) conditioned to exceed m. That is, 


7 (An)*/k! 
P{N’ =k} SOUL >m 

STEP 2: Generate the values of A,,...,A,y-, independent uniform (0, £) 
tandom variables. 

STEP 3. Generate the values of S,,...,5,,, independent service time 
random variables. 

STEP 4. Generate the value of N(¢), a Poisson random variable with 
mean At. 

STEP 5. If N(t) = j < m, use the arrival times A,,..., A; along with their 
service times S,,..., S; to compute the value of D = D,. 

STEP 6. If N(t) >m, use the arrival times A,,..., Ap along with their 
service times S}, . . . , Sẹ, to compute the value of D = Dym- 


Nothing that, 


E[D|N’, Ay, Deeg Ans Si, eee Sy] 
= Y EIDIN’, Aj; ...3 Ay, Si; ...3 Syn N(2) =j] 


J 
x P{N(t) =j, Aj, eee , Ayr, Si, sae Sye} 
=} E[DIN', Aj, aoe Ay, Sy, eee » Syr N(t) = j|P{N(t) =j} 


J 


= DPIN = 1+ E Do nPIMO =) 


j=0 j>m 


=€ 


we see that £ is the conditional expectation of D given some data. Consequently, 
the result follows from the conditional variance formula. g 
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Computing Multidimensional Integrals of Monotone 
Functions 


Suppose that we want to use simulation to estimate the n dimensional integral 


o= f [fem mdr daada 


With U,,..., U, being independent uniform (0,1) random variables, the preced- 
ing can be expressed as 


6= E[g(U,, sane U,)] 


Suppose that g is a nondecreasing function of each of its variables. That is, 
for fixed values x,,...,%;1,Xjz1+++-++X,» the function g(x,,...,%j,---,%,) 
is increasing in x,, for each i= 1,...,n. If we let Y = IE; U, then because 
both Y and g(U,,...,U,) are increasing functions of the U;, it would seem 
that E[Var(g(U,,...,U,,)|Y)] might often be relatively small. Thus, we should 
consider estimating 6 by stratifying on II7_,U,. To accomplish this, we need to 
first show 


(a) how to generate U}, . . . , U,, conditional on IT?_, U; having a specified value; 
and 
(b) how to generate the value of II?_, U; in a stratified fashion. 


To accomplish both of the preceding objectives, we relate the U; to a Poisson 
process. Recall that —log(U) is exponential with rate 1, and interpret —log(U;) 
as the time between the (i—1)" and the i" event of a Poisson process with 
rate 1. With this interpretation, the j' event of the Poisson process will occur at 
time T; where 


j 
T; =} —log(U;) = —log(U,. . . Uj) 


i=] 


Because the sum of n independent exponential random variables with rate 
1 is a gamma (n,1) random variable, we can generate the value of T, = 
—log(U, ---U,,) by generating (in a stratified fashion to be discussed) a gamma 
(n, 1) random variable. This results in a generated value for IT7_,U;, namely 


[Ju =e 
i=] 
To generate the individual random variables U}, . . . , U„ conditional on the value 


of their product, we use the Poisson process result that conditional on the n event 
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of the Poisson process occurring at time ż, the sequence of the first n — 1 event 
times is distributed as an ordered sequence of n — 1 independent uniform (0, 7) 
random variables. Thus, once the value of T, has been generated, the individual 
U; can be obtained by first generating n — 1 random numbers V,,..., V,_,, and 
then ordering them to obtain their ordered values Va) < Vio) < ... < Vig-1y. AS 
T, Vij represents the time of event j, this yields 


T,Viy = — log(U.. . Uj) 
= —log(U,... Uj_1) —log(U;) 
= T,Vg-1 — log(U;) 


Therefore, with Vo =0, Vin) =|, 
U;= eTalM@—Ve-nl | j=l,...,n (8.11) 


Thus we see how to generate U,,... , U„ conditional on the value of IE, U;. To 
perform the stratification, we now make use of the fact that T, = —log(IT7_, U;) 
is a gamma (n, 1) random variable. Let G,, be the gamma (n, 1) distribution 
function. If we plan to do m simulation runs, then on the k" run a random 
number U should be generated and T, should be taken to equal G7’ (==). For 
this value of T,, we then use the preceding to simulate the values of U,,..., Un 
and calculate g(U,,...,U,,). [That is, we generate n — 1 random numbers, order 
them to obtain Vo) < Va) <... < V1, and let the U; be given by (8.11)]. 
The average of the values of g obtained in the m runs is the stratified sampling 
estimator of E[g(U;,...,U,)]- 


Remarks 


1. A gamma random variable with parameters n, 1 has the same distribution 


as does 4x3, where y3,, is a chi-squared random variable with 2n degrees 
of freedom. Consequently, 


z l- 
G= Fg 


where Fo (x) is the inverse of the distribution function of a chi-squared 
random variable with 27 degrees of freedom. Approximations for the inverse 
of the chi-squared distribution are readily available in the literature. 

2. With a slight modification, we can apply the preceding stratification idea 
even when the underlying function is monotone increasing in some of its 
coordinates and monotone decreasing in the others. For instance, suppose 
we want to evaluate E[h(U,,...,U,,)], where h is monotone decreasing in 


182 8 Variance Reduction Techniques 
its first coordinate and monotone increasing in the others. Using that 1 — U; 
is also uniform on (0,1), we can write 


E[h(U,, Uy,...,U,)] = E[h(1 — U,, Uy, ...,U,)] = Elg(Uy, Up, ...,U,)] 


where 9(x,,X2,...,X,) =h(1—x,, x.,...,x,) is monotone increasing in 
each of its coordinates. o 


Compound Random Vectors 


Let N be a nonnegative integer valued random variable with probability mass 
function 


p(n) = P{N =n} 


and suppose that N is independent of the sequence of independent and iden- 
tically distributed random variables X,, X,,..., having the common distribu- 
tion function F. Then the random vector (X,,..., Xy) is called a compound 
random vector. (When N = 0, call the compound random vector the null 
vector.) 

For a family of functions g,(x,,...,%,), > 0, with g, = 0, suppose we are 
interested in using simulation to estimate E[g,(X,,...,Xy)], for a specified 
compound random vector (X,,...,X,). Some functional families of interest 
are as follows. 


o If 
1, if Pix >a 
Enion) = k if otherwise 
then E[g,(X,,..., Xy)] is the probability that a compound random variable 
exceeds a. 


o A generalization of the preceding example is, for 0 < a < 1, to take 


1, if jax >a 


0, if otherwise 


T 


Now Elgy(X),...,Xy)] is the probability that the discounted sum of a 
compound random vector exceeds a. 

o Both of the previous examples are special cases of the situation where, for a 
specified sequence a; i > 1, 


1, if Eaa 


0, if otherwise 


Eni- paw Xn) = | 
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e One is sometimes interested in a function of a weighted sum of the k largest 
values of the random vector, leading to the consideration of the functions 


min(k,n) 
AEE e.” Xn) E, ( X GX (isn) 
i=] 


where Xn) is the i” largest of the values x,,...,x,, and where g is a 


specified function with g(0) = 0. 


To use simulation to estimate 0 = E[gy(X,,...,Xy)], choose a value of 
m such that P{N > m} is small, and suppose that we are able to simulate N 
conditional on it exceeding m. With p, = P{N = n}, conditioning on the mutually 
exclusive and exhaustive possibilities that N is O or 1,..., or m, or that it 
exceeds m, yields 


6= > Eley (X. ++>Xy)|N =n]p, + Elgy(X, -- 
n=0 


.,Xy)|N> m]P{N > m} 


=)" Efe, (Xi. -o XDN =alp, + Elgy(X1,...,Xy)|N> m]P{N > m} 
n=0 


= > Elg, (X. <- -o Xa) Pa +Elen (Xi; - - -3 Xn) |N > m] h = S| 
n=0 n=0 


where the final equality made use of the independence of N and X,,..., X,,. 
To effect a simulation run to estimate E[g,(X,,...,Xy)], first generate the 
value of N conditional on it exceeding m. Suppose the generated value is m’. 
Then generate m’ independent random variables X,,..., X„ having distribution 
function F. That completes a simulation run, with the estimator from that run 
. being 


E= L eZ mais »Xn)Pn + 8m (Xi; ress Xn) h-Żzl 


n=l n=0 
Remarks 


1. If it is relatively easy to compute the values of the functions g,, we recom- 
mend that one also use the data X,,..., X„ in the reverse order to obtain a 
second estimator, and then average the two estimators. That is, use the run 
estimator 


1 m 
Grr Goes Se 


n=] 


E Xman) Pn + Smt (Xm EEE X,) i = A) 


n=0 
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2. If if is difficult to generate N conditional on its value exceeding m, it is 
often worthwhile to try to bound E[gy(X;,...,Xy)|N> m]P{N > m} and 
then determine an appropriately large value of m that makes the bound 
negligibly small. (For instance, if the functions g, are indicator—that is, 
0 or 1—functions then E[gy(X,,...,Xy)|N> m]P{N > m} < P{N> m}.) 
The result from a simulation that ignores the term E[gy(X,,...,Xy)|N> 
m|P{N > m} will often be sufficiently accurate. 


8.6 Importance Sampling 


Let X = (X\,..., X,,) denote a vector of random variables having a joint density 
function (or joint mass function in the discrete case) f(x) = f(x,,...,X,), and 
suppose that we are interested in estimating 


6 = E[h(X)] = I h(x) f(x) dx 


where the preceding is an n-dimensional integral over all possible values of x. 
(If the X; are discrete, then interpret the integral as an n-fold summation.) 

Suppose that a direct simulation of the random vector X, so as to compute 
values of A(X), is inefficient, possibly because (a) it is difficult to simulate a 
random vector having density function f(x), or (b) the variance of A(X) is large, 
or (c) a combination of (a) and (b). 

Another way in which we can use simulation to estimate @ is to note that if 
g(x) is another probability density such that f(x) = 0 whenever g(x) = 0, then 
we can express 0 as 


phx) sl) 
ee 


_ p | ACA) 
-pa an 


where we have written E, to emphasize that the random vector X has joint 
density g(x). 

It follows from Equation (8.12) that @ can be estimated by successively gener- 
ating values of a random vector X having density function g(x) and then using as 
the estimator the average of the values of h(X)f(X)/g(X). If a density function 
g(x) can be chosen so that the random variable h(X)f(X)/g(X) has a small 
variance, then this approach—teferred to as importance sampling—can result in 
an efficient estimator of 0. 

Let us now try to obtain a feel for why importance sampling can be useful. To 
begin, note that f(X) and g(X) represent the respective likelihoods of obtaining 


a(x) dx 
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the vector X when X is a random vector with respective densities f and g. 
Hence, if X is distributed according to g, then it will usually be the case that 
J(X) will be small in relation to g(X), and thus when X is simulated according 
to g the likelihood ratio f(X)/g(X) will usually be small in comparison to 1. 
However, it is easy to check that its mean is 1: 


E, Fal = AD ox) ax = f f(x) dx =1 


g(x) 

Thus we see that even though f(X)/g(X) is usually smaller than 1, its mean is 
equal to 1, thus implying that it is occasionally large and so will tend to have a 
large variance. So how can h(X) f(X)/g(X) have a small variance? The answer 
is that we can sometimes arrange to choose a density g such that those values 
of x for which f(x)/g(x) is large are precisely the values for which h(x) is 
exceedingly small, and thus the ratio h(X)f(X)/g(X) is always small. Since this 
will require that h(x) is sometimes small, importance sampling seems to work 
best when estimating a small probability, for in this case the function h(x) is 
equal to 1 when x lies in some set and is equal to 0 otherwise. 

We will now consider how to select an appropriate density g. We will find 
that the so-called tilted densities are useful. Let M(t) = E,[e™] = f e" f(x) dx be 
the moment generating function corresponding to a one-dimensional density f. 


Definition A density function 


_ et f(x) 
fi) = MO 


is called a tilted density of f, —œ% < t < œ. 


A random variable with density f, tends to be larger than one with density f 
when ¢ > 0 and tends to be smaller when t < 0. 
In certain cases the tilted densities f, have the same parametric form as does f. 


Example 8t Iff is the exponential density with rate A, then 
F(x) = Ce*he* = CeO 


where C = 1/M(t) does not depend on x. Therefore, for t < A, f, is an exponential 
density with rate A — t. 
If f is a Bernoulli probability mass function with parameter p, then 


f(x) =p*(1—p)'*, x=0,1 
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Hence, M(t) = E,[e] = pë +1 — p, and so 


1 tx I~x 
mp (rel) 


7 ( pe! I 1— p ) 1—x 
T \pet+1—p/) \pe'+i—p 
That is, f, is the probability mass function of a Bernoulli random variable with 
parameter p, = (pe')/(pe' +1 —p). 


f(x) = 


We leave it as an exercise to show that if f is a normal density with parameters _ 


u and o° then f, is a normal density having mean u+ 07¢ and variance o?. O 


In certain situations the quantity of interest is the sum of the independent 
random variables X,,...,X,. In this case the joint density f is the product of 
one-dimensional densities. That is, 


FQ, +++ 5 Xn) = Ai 1) «Fa (on) 


where f; is the density function of X;. In this situation it is often useful to 
generate the X; according to their tilted densities, with a common choice of t 
employed. 


Example 8u Let X,,...,X, be independent random variables having 
respective probability density (or mass) functions f;, for i=1,...,n. Suppose 
we are interested in approximating the probability that their sum is at least as 
large as a, where a is much larger than the mean of the sum. That is, we are 
interested in 


0= P{S> a} 


where S = Ð; Xp and where a > D; E[X;]. Letting {S > a} equal 1 if S>a 
and letting it be 0 otherwise, we have that 


0=E,[I{S > a}] 


where f = (f,,..., fa). Suppose now that we simulate X; according to the tilted 
mass function f;,,i=1,...,n, with the value of t, t > 0, left to be determined. 
The importance sampling estimator of 6 would then be 


sasda 
waala 

Now, 
f(X) 
Fi (XD 


= M,(t)e"™ 
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and so, 
ô =I{S>= a}M(t)e"* 


where M(t) = ILM,(t) is the moment generating function of S. Since t > 0 and 


I{S > a} is equal to 0 when S < a, it follows that 


US > ajes <e" 
and so 
Ô < M(N e" 


To make the bound on the estimator as small as possible we thus choose t, t > 0, 
to minimize M(t)e~*. In doing so, we will obtain an estimator whose value 
on each iteration is between 0 and min, M(r)e~*. It can be shown that the 
minimizing t—call it r*—is such that 


E,.[S] = E, x, =a 


where, in the preceding, we mean that the expected value is to be taken under 
the assumption that the distribution of X; is fir fori=1,...,n. 
For instance, suppose that X,,..., X,, are independent Bernoulli random vari- 
ables having respective parameters p; fori=1,...,n. Then, if we generate 
the X; according to their tilted mass functions p;,,i=1,...,M, the importance 
sampling estimator of 0 = P{S > a} is 


6 = HS > ae [] (pe +1- p:) 


i=] 


Since p;, is the mass function of a Bernoulli random variable with parameter 
(p;e")/(p;e' +1— p;), it follows that 


i=1 p:e +1 — Pi 


The value of ¢ that makes the preceding equal to a can be numerically approxi- 
mated and the ż utilized in the simulation. 
As an illustration, suppose that n = 20, p; = 0.4, a= 16. Then 


0.4e' 


ELS] = 20075406 
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Setting this equal to 16 yields after a little algebra that 
e =6 


Thus, if we generate the Bernoullis using the parameter (0.4e")/(0.4e" +0.6) = 
0.8, then as 


M(t*) = (0.4e" +.0.6) and eS =(1/6)5 
we see that the importance sampling estimator is 
6 = IS > 16}(1/6)53” 
It follows from the preceding that 
Ô < (1/6)!°3” = 81/2! = 0.001236 


That is, on each iteration the value of the estimator is between 0 and 0.001236. 
Since, in this case, @ is the probability that a binomial random variable with 
parameters 20, 0.4 is at least 16, it can be explicitly computed with the result 
@ = 0.000317. Hence, the raw simulation estimator J, which on each iteration 
takes the value 0 if the sum of the Bernoullis with parameter 0.4 is less than 16 
and takes the value 1 otherwise, will have variance 


Var(J) = 0(1 — 0) = 3.169 x 1074 


On the other hand, it follows from the fact that 0 < @ < 0.001236 that (see 
Exercise 29) 


Var(6) < 2.9131 x 1077 o 


Example 8v Consider a single server queue in which the times between 
successive customer arrivals have density function f and the service times have 
density g. Let D,, denote the amount of time that the nth arrival spends waiting in 
queue and suppose we are interested in estimating a = P{D, > a} when a is much 
larger than E[D,,]. Rather than generating the successive interarrival and service 
times according to f and g, respectively, we should generate them according to 
the densities f_, and g,, where t is a positive number to be determined. Note that 
using these distributions as opposed to f and g will result in smaller interarrival 
times (since —ż < 0) and larger service times. Hence, there will be a greater 
chance that D, > a than if we had simulated using the densities f and g. The 
importance sampling estimator of a would then be 


& = HD, > a} [M (AM H] 
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where S, is the sum of the first n interarrival times, Y, is the sum of the first 
n service times, and M, and M, are the moment generating functions of the 
densities f and g, respectively. The value of ż used should be determined by 
experimenting with a variety of different choices. a 


Example 8w Let X,, X,,... be a sequence of independent and identically 
distributed normal random variables having mean u and variance 1, where u <0. 
An important problem in the theory of quality control (specifically in the analysis 
of cumulative sum charts) is to determine the probability that the partial sums 
of these values exceed B before going below —A. That is, let 


n 
Sn = D Xi 
i=] 


and define 
N =Min{n: either S, < —A, or S, > B} 
where A and B are fixed positive numbers. We are now interested in estimating 
6=P{S, > B} 


An effective way of estimating @ is by simulating the X; as if they were 
normal with mean — and variance 1, stopping again when their sum either 
exceeds B or falls below —A. (Since —p is positive, the stopped sum is greater 
than B more often than if we were simulating with the original negative mean.) 
If X,,..., Xy denote the simulated variables (each being normal with mean —u 
and variance 1) and 


ps 1 if DaX >B 
0O otherwise 


then the estimate of @ from this run is 


ERA (8.13) 


i=l 


where f, is the normal density with mean c and variance 1. Since 


sao a-e] 


f) exp [-=] 


2 
erx 
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it follows from (8.13) that the estimator of @ based on this run is 


N 
Iexp {2nd} = Texp{2uSy} 


i=l 


When Z is equal to 1, Sy exceeds B and, since ys < 0, the estimator in this case 
is less than e°”?. That is, rather than obtaining from each run either the value 
0 or 1—as would occur if we did a straight simulation—we obtain in this case 
either the value 0 or a value that is less than e?#?, which strongly indicates why 
this importance sampling approach results in a reduced variance. For example, 
if u = —0.1 and B = 5, then the estimate from each run lies between 0 and 


7 = 0.3679. In addition, the above is theoretically important because it shows 
at 


P{cross B before — A} < e”? 
Since the above is true for all positive A, we obtain the interesting result 


P{ever cross B} < e7#? o 


Example 8x Let X = (X,,...,Xio9) be a random permutation of 
(1,2,..., 100). That is, X is equally likely to be any of the (100)! permutations. 
Suppose we are interested in using simulation to estimate 


j=! 


100 
6=P [5a > 20,00} 


To obtain a feel for the magnitude of 6, we can start by computing the mean 


and standard deviation of pga jX;. Indeed, it is not difficult to show that 


100 
E b rd = 100(101)?/4 = 255,025 


j=l 


100 
SD ( +) = ,/(99)(100)2(101)2/144 = 8374.478 


j=l 


Hence, if we suppose that ES - JX; is roughly normally distributed then, with Z 
representing a standard normal random variable, we have that 


E | 290,000 — 255,025 | 
8374.478 
= P{Z > 4.1764} 
= 0.00001481 
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Thus, 8 is clearly a small probability and so an importance sampling estimator 
is worth considering. 
To utilize importance sampling we would want to generate the permutation X 
so that there is a much larger probability that Sera jX; > 290,000. Indeed, we 
should try for a probability of about 0.5. Now, el JX; will attain its largest 
value when X, = j,j7=1,..., 100, and indeed it will tend to be large when 
X; tends to be large when qs is large and small when j is small. One way to 
generate a permutation X that will tend to be of this type i is as follows: Generate 
independent exponential random variables Y,, j = 1,..., 100, with respective 
rates Aj, j =1,...,100 where Aj, j = 1, . . . , 100, is an increasing sequence 
whose values will soon be specified. Now, “for j=1,...,100, let X, be the 
index of the jth largest of these generated values. That is, 


Vy > Yp > et Ty 


Since, for j large, Y, will tend to be one of the smaller Y’s, it follows that X; 
will tend to be large when j is large and so Be JX; will tend to be larger than 
if X were a uniformly distributed permutation. 

Let us now compute ro DB jX;]. To do so, let R(j) denote the rank of 
Y, j=1,...,100, where rank 1 signifies the largest, rank 2 the second largest, 
and so on until rank 100, which is the smallest. Note that since X; is the index 
of the jth largest of the Y’s, it follows that R(X;) = j. Hence, 


100 100 100 


j= TR, )X = 2 JRO) 


j=l 


where the final equality follows since X,,...,Xjo9 is a permutation of 
1,..., 100. Therefore, we see that 


100 100 
E P i =D JERI 


To compute E[R,], let I(i, j) = 1 if Y, < Y, and let it be 0 otherwise, and note 
that 


R,=14+ Ki j 
kij 


In words, the preceding equation states that the rank of F, is 1 plus the number 
of the Y, that are larger than it. Hence, taking expectations and using the fact 
that 


Aj 
PSRS AHA 
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we obtain that 


ay 


bij 


b |- 100 
E jX; xi a) 
j=l : kgj Ai +4, 


If we let A; = = j°’, 7=1,..., 100, then a computation shows that Eat JX;|= 
290, 293. 6, and so when Xi is generated using these rates it would seem that 


and thus 


100 
P [3% > 20.00} 0.5 


j=! 


Thus, we suggest that the simulation estimator should be obtained by first 
generating independent exponentials Y, with respective rates j°’, and then letting 
X; be the index of the jth largest, j =1,...,100. Let [= 1 if D19 jx, > 
290, 000 and let it be 0 otherwise. Now, the outcome will be X when Yy „ is 
the smallest Y, Yy,, is the second smallest, and so on. The probability of ‘this 
outcome is 11063! when X is equally likely to be any of the permutations, 
whereas its probability when the simulation is as performed is 


(Xio)? (Xo9)°7 a (X,)°7 (X,)°7 
DDPS (X) E(X) Sa (X) (X,)07 


Therefore, the importance sampling estimator from a single run is 


O A E E, 


ĝ = 
07 f 
(100): (Tes 2) (Tn) 
Defor the simulation is begun, the values of C = 1.7571 log(n) and a(j) = 
—j®™,j=1,...,100 should be computed. A simulation run can then be 


obtained as follows: 


For j= 1 to 100 

Generate a random number U 

Y, = a(j)logU 

Next 

Let X;, 7=1,...,100, be such that Fy, is the jth largest Y 


i ae JX 290, 000 set 6 = 0 and stop 
S=0,P=0 
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For n = 1 to 100 
S=S+(X,)*’ 
P = P+log(S) 
Next 

ĝ= eP-C 


A sample of 50,000 simulation runs yielded the estimate 6=3.77x 1076, with a 


` sample variance 1.89 x 1078. Since the variance of the raw simulation estimator, 


which is equal to 1 if Bs ı JX; > 290,000 and is equal to 0 otherwise, is 
Var(I) = 6(1 — 0) ~ 3.77 x 1076, we see that 


Var(!) 
Var(6) 


=~ 199.47 o 


Importance sampling is also quite useful in estimating a conditional expectation 
when one is conditioning on a rare event. That is, suppose X is a random vector 
with density function f and that we are interested in estimating 


6 = E[A(X)|X € A] 


where h(x) is an arbitrary real valued function and where P{X € A} is a small 
unknown probability. Since the conditional density of X given that it lies in A is 


F(x) 


ae ee A 
P(xe Ay ** 


f(x|X € A) = 


we have that 

rea HX) F(x) d(x) 
P{X € A} 

_ E[h(X)I(X € A)| 

— — BUI(X € A)] 


SEIN 
E[D| 


6= 


where E[N] and E[D] are defined to equal the numerator and denominator in the 
preceding, and J(X € A) is defined to be 1 if X € A and 0 otherwise. Hence, 
rather than simulating X according to the density f, which would make it very 
unlikely to be in A, we can simulate it according to some other density g which 
makes this event more likely. If we simulate k random vectors X!,..., X* 
according to g, then we can estimate E[N] by + X}; N; and E[D] by 1 £i D 
where 


y — ROU € AAR!) 
: a) 
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and 


D= (Xie A) F(X) 
g(X') 


Thus, we obtain the following estimator of 6: 


d = Zi UK € AAK)/2(X') (8.14) 
ja IX € A) SX) 

The mean square error of this estimator can then be estimated by the bootstrap 

approach (see, for instance, Example 7e). 


Example 8y Let X; be independent exponential random variables with 
respective rates 1/(1 +2), i = 1,2,3,4. Let S = Saree X;, and suppose that we 
want to estimate 6 = E[S|S > 62]. To accomplish this, we can use importance 
sampling with the tilted distributions. That is, we can choose a value ¢ and then 
simulate the X, with rates 1/(i+2) —t. If we choose t = 0.14, then E,[S] = 
68.43. So, let us generate k sets of exponential random variables X, with rates 
1/G+2) —0.14, i= 1, 2, 3, 4, and let S; be the sum of the jth set, j= 1,..., k. 
Then we can estimate 


C k 
E[SI(S > 62)]by T ESKS; > 62) 45 
j=l 
C k 
E[I(S > 62)]by T be KS, > 62) 70-145; 
j=l 


where C = [If ryz = 81-635. The estimator of is 


Ei SIS; > 62)e- 45) 


j=l 


Eia (S; > 62)e 


j=l 


ĝ= 
o 


The importance sampling approach is also useful in that it enables us to 
estimate two (or more) distinct quantities in a single simulation. For example, 
suppose that 


0, =E[A(Y)] and 6, =E[h(W)] 
where Y and W are random vectors having joint density functions f and g, 


respectively. If we now simulate W, we can simultaneously use h(W) and 
h(W)f(W)/g(W) as estimators of 6, and 6,, respectively. For example, suppose 
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we simulate T, the total time in the system of the first r customers in a queueing 
system in which the service distribution is exponential with mean 2. If we now 
decide that we really should have considered the same system but with a service 
distribution that is gamma distributed with parameters (2, 1), then it is not 
necessary to repeat the simulation; we can just use the estimator 


Tas S,exp{—S;} r = one 
pat AP i YT exp] -y s. 
Ts (Gps | 27 hir ! 


i=l i=l 


where S; is the (exponentially) generated service time of customer i. [The above 
follows since the exponential service time density is g(s) = 5e~*/”, whereas the 
gamma (2, 1) density is f(s) = se~*.] 

Importance sampling can also be used to estimate tail probabilities of a random 
variable X whose density f is known, but whose distribution function is difficult 
to evaluate. Suppose we wanted to estimate P;{X > a} where the subscript f is 
used to indicate that X has density function f, and where a is a specified value. 
Letting 


tod=l0 tees 
we have the following. 
P {X > a} = E, [KX > a)] 
=E, [x >a) | the importance sampling identity 
=E, [x> a) a X> a|Rx> a} 
+E, [æ >a) aR X< a| P {X <a} 


-2,| 42 X> a|P {X> a} 


If we let g be the exponential density 
g(x) =Ae*, x>0 


the preceding shows that for a > 0 


e Aa ‘op 
A E [e f(X)|X > a] 


PA{X>ab= 
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Because the conditional distribution of an exponential random variable that is 
conditioned to exceed a has the same distribution as a plus the espongntal, the 
preceding gives that 


P {X >a}= a [e+ AX +a)] 


= ZE le f(X +a)] 


Thus, we can estimate the tail probability P;{X > a} by generating X,,..., Xp 
independent exponential random variables with rate À, and then using 


LLS ax ay +a) 
Ak i=] 


as the estimator. 

As an illustration of the preceding, suppose that f is the density function of a 
standard normal random variable Z, and that a > 0. With X being an exponential 
random variable with rate A = a, the preceding yields that 


1 2 
P{Z > a} = Ee rr? ] 
AN LT 
ew! 2 
= Bat? 
a2 T l l 


Thus we can estimate P{Z > a} by generating X, an exponential random variable 
with rate a, and then using 


as the estimator. To compute the variance of this estimator note that 
Ele*?] = i e*l ae ™ dx 
0 
= af exp{—(x° +2ax)/2} dx 
0 
— ppt /2 a = 2 
= ae exp{—(x-+a)°/2} dx 
0 


= acl? f exp{—y’/2} dy 
= ae*!?./27 D(a) 
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Similarly, we can show that 
Efe] = ae” S/T B(a/V2) 
Combining the preceding then yields Var(EST). For instance, when a = 3 
Eje] = 3e*>./27 B(3) ~ 0.9138 

and 

Eje] = 3e?./a@ (2.1213) ~ 0.8551 
giving that 

Var(e~*’/) = .8551 — (.9138)? = 0.0201 


= 0.001477, we obtain, when a = 3, that 


Var(EST) = (0.001477) Var (e %2) = 4.38 x 1078 


As a comparison, the variance of the raw simulation estimator, equal to 1 if a 
generated standard normal exceeds 3 and to 0 otherwise, is P{Z > 3}(1—P{Z > 
3}) = 0.00134. Indeed, the variance of EST is so small that the estimate from a 
single exponential will, with 95 percent confidence, be within +0.0004 of the 
correct answer. 


.8.7 Using Common Random Numbers 


Suppose that each of n jobs is to be processed by either of a pair of identical 
machines. Let T, denote the processing time for job i,i,=1,...,n. We are 
interested in comparing the time it takes to complete the processing of all the 
jobs under two different policies for deciding the order in which to process 
jobs. Whenever a machine becomes free, the first policy, called longest job first, 
always chooses the remaining job having the longest processing time, whereas 
the second policy, called shortest job first, always selects the one having the 
shortest processing time. For example, if n = 3 and T, = 2, T, =5, and T} =3, 
then the longest job first would complete processing at time 5, whereas the 
shortest job first would not get done until time 7. We would like to use simulation 
to compare the expected difference in the completion times under these two 
policies when the times to process jobs, T;,..., T,,, ate random variables having 
a given distribution F. 


i 
| 
| 
i 
| 
I 
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In other words, if g(t,,...,1,) is the time it takes to process the n jobs 
having processing times ¢,,...,¢, when we use the longest job first policy and 
if A(t,,...,2,) is the time when we use the shortest first policy, then we are 


interested in using simulation to estimate 
d= 6, —6, 
where 
6,=Ele(T)], 6, =Elh(T)]), T=(1,...,T7,) 


If we now generate the vector T to compute g(T), the question arises whether we 
should use those same generated values to compute A(T) or whether it is more 
efficient to generate an independent set to estimate 0,. To answer this question 
suppose that we used T* = (77,..., 7*), having the same distribution as T, to 
estimate 6,. Then the variance of the estimator g(T) — A(T*) of @ is 


Var(g(T) — h(T*)) = Var(g(T)) + Var(h(T*)) — 2Cov(g(T), A(T*)) 
= Var(g(T)) + Var(h(T)) —2Cov(g(T), h(T*)) (8.15) 


Hence, if g(T) and A(T) are positively correlated—that is, if their covariance is 
positive—then the variance of the estimator of @ is smaller if we use the same 
set of generated random values T to compute both g(T) and A(T) than it would 
be if we used an independent set T* to compute A(T*) [in this latter case the 
covariance in (8.15) would be 0]. 

Since both g and h are increasing functions of their arguments, it follows, 
because increasing functions of independent random variables are positively 
correlated (see the Appendix of this chapter for a proof), that in the above case it 
is more efficient to successively compare the policies by always using the same 
set of generated job times for both policies. 

As a general rule of thumb when comparing different operating policies in 
a randomly determined environment, after the environmental state has been 
simulated one should then evaluate all the policies for this environment. That is, 
if the environment is determined by the vector T and g;(T) is the return from 
policy i under the environmental state T, then after simulating the value of the 
random vector T one should evaluate, for that value of T, all the returns g,(T). 


8.8 Evaluating an Exotic Option 
With time 0 taken to be the current time, let P(y) denote the price of a stock 


at time y. A common assumption is that a stock’s price evolves over time 
according to a geometric Brownian motion process. This means that, for any 
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price history up to time y, the ratio of the price at time t+ y to that at time y has 
a lognormal distribution with mean parameter ut and variance parameter to. 
That is, independent of the price history up to time y, the random variable 


log (= +y) ) 
PO) 
has a normal distribution with mean pt and variance to?. The parameters u and 
o are called, respectively, the drift and the volatility of the geometric Brownian 
motion. 

A European call option on the stock, having expiration time ¢ and strike K, 
gives its owner the right, but not the obligation, to purchase the stock at time t 
for a fixed price K. The option will be exercised at time t provided that P(t) > K. 
Because we are able to purchase a stock whose market price is P(t) for the price 
K, we say that our gain in this case is P(t) — K. Thus, in general, the gain at 
time ¢ from the option is 


(P(t) —K)* 
where 


ef x, ifx>0 
9 as 
0, ifx<0 


For a given initial price P(0) = v, let C(K, t, v) denote the expected value of the 
payoff from a K, t European call option. Using that 


W = log(P(t)/v) 


‘is a normal random variable with mean tu and variance to, we have that 


C(K, t, v) = E[(P(t) — K)*] = E[(ve” —K)*] 


It is not difficult to explicitly evaluate the preceding to obtain C(K, t, v). 

The preceding option is called a standard (or vanilla) call option. In 
recent years there has been an interest in nonstandard (or exotic) options. Among 
the nonstandard options are the barrier options; these are options that only 
become alive, or become dead, when a barrier is crossed. We will now consider 
a type of barrier option, called an up-and-in option, that is specified not only by 
the price K and time ż, but also by an additional price b and an additional time 
5s, S < t. The conditions of this option are such that its holder only has the right 
to purchase the stock at time ¢ for price K if the stock’s price at time s exceeds 
b. In other words, the K, t option either becomes alive at time s if P(s) > b, or 
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becomes dead if P(s) < b. We now show how we can efficiently use simulation 
to find the expected payoff of such an option. 
Suppose that P(0) = v, and define X and Y by 


xeatoe(22), gaa 


It follows from the properties of geometric Brownian motion that X and Y are 
independent normal random variables, with X having mean sy and variance so”, 
and Y having mean (t— s)ų and variance (t— s)o?. Because 


P(s) = ve* 


P(t) = ve**¥ 
we can write the payoff from the option as 
payoff = I(ve* > b)(ve*t” — K)* 
where 


x 1, if ve*>b 

ae k if veX <b 
Therefore, the payoff can be be simulated by generating a pair of normal ran- 
dom variables. The raw simulation estimator would first generate X. If X is 
less than log(b/v), that run ends with payoff value 0; if X is greater than 
log(b/v), then Y is also generated and the payoff from that run is the value of 
(v extY — K)*. 

We can, however, significantly improve the efficiency of the simulation by 
a combination of the variance reduction techniques of stratified sampling and 
conditional expectation. To do so, let R denote the payoff from the option, and 
write 


E[R] = E[R|ve* > b]P{ve* > b} + E[R|ve* < b]P{ve* < b} 
= E[R|X > log(b/v)|P{X > log(b/v)} 


= HA1x> y(n} (22CM) 


where © = 1—@ is the standard normal tail distribution function. Therefore, 
to obtain E[R] it suffices to determine its conditional expectation given that 


X > log(b/v), which can be accomplished by first generating X conditional on 


the event that it exceeds log(b/v). Suppose that the generated value is x (we will 
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show in the following how to generate a normal conditioned to exceed some 
value). Now, rather than generating the value of Y to determine the simulated 
payoff, let us take as our estimator the conditional expected payoff given the value 
of X. This conditional expectation can be computed because, as X > log(b/v), 
the option is alive at time s and thus has the same expected payoff as would 
a standard option when the initial price of the security is ve* and the option 
expires after an additional time t —s. That is, after we simulate X conditional on 
it exceeding log(b/v), we should use the following estimator for the expected 
payoff of the barrier option: 


. _ _ x= ( log(b/v) — sp 
Estimator = C(K, t — s, ve* )® (eee ) (8.16) 


After k simulation runs, with X; being the generated value of the conditioned 
normal on run i, the estimator is 


k 
fo) (ea) -5 C(K, t—s, ve“) 


We now show how to generate X conditional on it exceeding log(b/v). 
Because X can be expressed as 


i=] 


X=sp+oVsZ (8.17) 


where Z is a standard normal random variable, this is equivalent to generating 
Z conditional on the event that 


_ log(b/v) — sp. 
<a 


Thus, we need to generate a standard normal conditioned to exceed c. 

When c < 0, we can just generate standard normals until we obtain one larger 
than c. The more interesting situation is when c > 0. In this case, an efficient 
procedure is to use the rejection technique with g being the density function 
of c+Y, where Y is an exponential random variable whose rate A will be 
determined in the following. The density function of c+ Y is 


Z>c (8.18) 


g(x) = Ae Me = Aer", x >e 
whereas that of the standard normal conditioned to exceed c is 


f(x) EN 


x>C 


1 
~ Vin tO 
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Consequently, 


f(x) zs ene ghx—x?/2 


g(x) AB(c) J 20 


Because e*-*/? is maximized when x = A, we obtain that 


42 /2—Ac 
max fœ) <C(A)= A 
z g(x) AB(c) 2T 
Calculus now shows that C(A) is minimized when 
yag +Vc?+4 
— 


Take the preceding to be the value of A. Because 


fO) ne er Ce 
C(A)a(x) 


we see that the following algorithm generates a standard normal random variable 
that is conditioned to exceed the positive value c. 


1. Set À = Sve, 

2. Generate U, and set Y = —}log(U,) and V=c+Y. 
3. Generate U}. 

4. If U, < e~- stop; otherwise return to 2. 


The value of V obtained is distributed as a standard normal random variable that 
is conditioned to exceed c > 0. 


Remarks 


o The preceding algorithm for generating a standard normal conditioned to 
exceed c is very efficient, particularly when c is large. For instance, if c = 3 
then A + 3.3 and C(A) © 1.04. 

The inequality in Step 4 can be rewritten as 


—log(U,) > (V—A)?/2 


Using that —log(U,) is exponential with rate 1, and that conditional on an 
exponential exceeding a value the amount by which it exceeds it is also 
exponential with the same rate, it follows that not only does the preceding 
algorithm yield a standard normal conditioned to exceed c, but it also gives 
an independent exponential random variable with rate 1, which can then be 
used in generating the next conditioned standard normal. 
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e Using that C(K, t, v), the expected payoff of a standard option, is an increas- 
ing function of the stock’s initial price v, it follows that the estimator given 
by (8.16) is increasing in X. Equivalently, using the representation of Equa- 
tion (8.17), the estimator (8.16) is increasing in Z. This suggests the use of 
Z as a control variate. Because Z is generated conditional on the inequality 
(8.18), its mean is 


E[Z|Z > c] = f xe* dx 


1 
2r B(c) Je 


ee l2 


~ Vim Slo) 


The expected return from the barrier option can be expressed as a two- 
dimensional integral involving the product of normal density functions. This 
two-dimensional integral can then be evaluated in terms of the joint proba- 
bility distribution of random variables having a bivariate normal distribution. 
However, for more general payoff functions than (P(t) — K)*, such as power 
payoffs of the form [(P(t) — K)*]*, such expressions are not available, and 
the simulation procedure described might be the most efficient way to esti- 
mate the expected payoff. o 


8.9 Estimating Functions of Random Permutations and 
Random Subsets 


In this section we present some new variance reduction techniques useful when 


using simulation to estimate the expected value of a function either of a random 
permutation or of a random subset. 


Random Permutations 


Let J,,..., 1, be equally likely to be any of the n! permutations of 1, . . . , n, and 


suppose we are interested in using simulation to estimate 6 = E[f(v;,,,..-, u;,)], 
for specified values v; < v, <... <v,, and a specified function f. After gener- 
ating a random permutation V = (u;,,..., v;,), the “antithetic permutation” that 


reverses the order of the permutation suggests itself. Namely, 


Vi = (vr Vn) 


We will show that if f is an interchange monotone function (to be defined), 
then using V; in conjunction with V is better than evaluating f at V and then at 
another random permutation that is independent of V. 
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Let = (Us oe s Uire ees Ujs eeta Vi) and V? = (Vis. es Uj eee Viset eg U; 
be identical permutations of v,,...,v, with the exception that v; and v; are 
interchanged. Say that the function f, defined on permutations of v,,...,,, is 
an interchange increasing function if 


flv) = f(v’) 


whenever v; > vj, and say that it is an interchange decreasing function if 


fv") s fv’) 


whenever v; > v;. That is, the value of an interchange increasing (decreasing) 
function of a permutation becomes larger (smaller) as larger elements of the 
permutation are interchanged towards the front. 


Example 8z Suppose that n jobs must be processed sequentially on a single 
machine. Suppose that the processing times are 4 < fh <--+<t,,andthatareward 
R(t) is earned whenever a job processing is completed at time t. Consequently, the 
total reward when the jobs are processed in the order i,,... , i, is 


fiat dS ROE +i) 
j=l 
It easily follows that f is an interchange decreasing function when R(f) is a 


decreasing function of t, and is an interchange increasing function when R(?) is 
a increasing function of t. o 


In the following, suppose that V = (V,,,..., V;,) is equally likely to be any 
of the n! permutations of v,,...,V,, and let V7; =(V,,..., Vn). 


Theorem If g and h are either both interchange increasing or are both 
interchange decreasing functions defined on permutations of v,,...,v,, then 


Cov(g(V), A(V;)) <0 
To prove the theorem we will use the following lemma. 


Lemma If h is interchange increasing, then E[h(V)|J, =i] is increasing in 
i, and E[A(V,)|J, = i] is decreasing in i. If h is interchange decreasing, then 
E[ACV)|I, = i] is decreasing and E[h(V,)|J, = i] is increasing in i. 
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Proof of Lemma Let i > 1, and let P denote the set of all (n —2)! permuta- 


tions of the values 1,...,i—2,i+1,...,m. Then, for an interchange increasing 
function h 


n=l 
E[h(V) |, =i]= GD D D Ay; ¥,, 5+ es Urp Vis Urp e eUr) 


* (xy, 06. Xn-2)EP k=l 


1 n—1 
= G1), > 2 hluts E R alle: 2) 
EETAS Xn-2)EPk=1 


=E[A(V)|;=i—1] 


where the inequality follows because v;_; < v; and h is interchange increasing. 
Similarly, 


FAVI = y D TION, A TOE E) 


? Ipag? E 
EEE TRE X,-2)€Pk=1 


1 n=l 
SG- Do huge s Urp ete ease aUi) 


(Xis +--+, X_-2)EPk=1 
=E[A(V,)|f, =i-1] 
The proof for an interchange decreasing function h is similar. d 
Proof of Theorem The proof is by induction on n. Suppose g and h are 
interchange increasing functions. As the theorem is true for n = 1 (because 
the random vector is deterministic in this case, the covariance is 0), assume 
it to be true for n— 1. Because, for fixed j, the functions glv; ype es Xan) 
and h(x,,...,X,-1,U;) are both interchange increasing functions defined on 


the (n — 1)! permutations (x,,..., X,-1) OF Vis © © s Vj» Vipis + + + > Up it follows 
from the induction hypothesis that 


Cov(g(V), AV) =j <0 
implying that 
E[Cov(g(V), h(V1)[L)] <0 


It follows from Lemma 1 that E[g(V)|J,] is increasing in J,, whereas E[h(V,)|,] 
is decreasing in J,. Consequently, 


Cov(E[g(V)|f,], EAV) <0 


| 
| 
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The result now follows upon application of the conditional covariance identity, 
which states that for any random variables X, Y and Z, 


Cov(X, Y) = E[Cov(X, Y)|Z]+ Cov(E[X|Z], E[Y|Z]) 
The proof when f and g are interchange decreasing functions is similar. o 


Letting g = h in the Theorem shows, when f is an interchange monotone 
function, that evaluating f at V, and at V is better than evaluating f at V and 
then at another random permutation that is independent of V. For instance, if in 
Example 8z we wanted to use simulation to estimate the expected return when the 
jobs are processed in a random order, then this use of “antithetic permutations” 
will lead to a variance reduction when compared to the approach of generating 
independent permutations. 


Remark The theorem is not true without conditions on g. For instance, 
suppose the function f defined on permutations of 1,..., is very large for 
permutations having 1 either near the front or near the back of the permuta- 
tion. Then, f(,,...,2,) and fU,,...,2,) would be positively correlated when 
I,,...,2, is a random permutation of 1,..., 7. 


Random Subsets 


Suppose now that we want to use simulation to determine 6 = E[g(B)], where 
B is equally likely to be any of the (7) subsets of S = {1, 2, . . . , n} that contain 
k elements, and g is a function defined on k—element subsets of S. Say that 
the function g is increasing (decreasing) if for all subsets A of size k—1, 
g(AUi) is an increasing (decreasing) function of i for i ¢ A. Now, rather than 
generating independent subsets of size k to estimate 0, one can also generate first 
a random k—element subset of S, call it R,; then generate a random k—element 
subset from S— R,, call it R,; then generate a random k—element subset from 
S—R,—R,, call it R}, and so on. Using the previously proven theorem, we 
now show that when g is a monotone function this latter approach results in 
a better estimate of 0 than would be obtained from generating independent 
subsets. 


Corollary With R; as specified in the preceding, 


Cov(g(R;), g(R;)) <0, ij 


when g is either an increasing or decreasing function. 
Proof Suppose n > 2k. Define the function f on permutations of S$ by 


Flis- -s in) = gli, .-- i) 
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where g(i,,...,i,) = g({i,,.-.,i,}). Because f is an interchange monotone 
function it follows from Theorem 2 that, for a random permutation I,,..., Ip, 


the covariance between f(J,,...,J,) and fU,,...,4,) is nonnegative. But this 
means that 


Cov(g(I,, ET Ik), 8 ln- Ley T,)) < 0 


The result now follows because the joint distribution of g(J,,...,J,), and 
8(ln-k+1> <--> p) is the same as the joint distribution of g(R;) and g(R,) 
whenever R; and R; are randomly chosen non-overlapping k—element subsets 
of S. o 


Remarks 


(a) Suppose n is not an integral multiple of k, say n = ki + j, where 0 < j < k. 
Then after generating R,,...,R; one could either start over, or (better 
yet) one can generate one additional k—element subset of S by using the j 
elements not in any of R,,...,R; along with a random selection of k — j 
elements of Ui_,R,. 

(b) The preceding corollary is not true without some conditions on g. To see 
this, suppose that n = 2k and that g(R) is large (say equal to 1) when 
R contains exactly one of the values 1, 2 and is small (say, equal to 0) 
otherwise. Then g(R) and g(R*) would be positively correlated. 


8.10 Appendix: Verification of Antithetic Variable 
Approach When Estimating the Expected Value of 
Monotone Functions 


_The following theorem is the key to showing that the use of antithetic variables 


will lead to a reduction in variance in comparison with generating a new inde- 
pendent set of random numbers whenever the function h is monotone in each of 
its coordinates. 


Theorem [/f X,,...,X, are independent, then for any increasing functions 
fand g of n variables 

El f(%)g(X)] = ELF) IE[s(X)] (8.19) 
where X = (X,,...,X,). 


Proof The proof is by induction on n. To prove it when n = 1, let f and g 
be increasing functions of a single variable. Then for any x and y 


(f(x) — FO) [g@) — g0)] = 0 
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since if x > y(x < y) then both factors are nonnegative (nonpositive). Hence, 
for any random variables X and Y, 


F — FM) I[s(X) — g(¥)] = 0 


implying that 
ESD) -DLE -h = 0 


or, equivalently 


ELFE) + ELAM 8] = ERDE] + EFE] 


If we now suppose that X and Y are independent and identically distributed then, 
as in this case, 


E[f(X)s(X)] = EFO] 
ELZ) H] = EH) (2)] = EASE 


we obtain the result when n = 1. 
So assume that Equation (8.19) holds for n— 1 variables, and now suppose 
that X,,..., X,, are independent and f and g are increasing functions. Then 


BL f(X)g(W)1X,, = x] | 
= E(f(X, Eweg Xa-1> Xn)B(X1; Hery Xn- Xn)|Xn =x] 
= E[IX i -o -o Xn- Xn)8(Xi + + -o Xn- Xn)] 
by independence 
= E(f(X, ea Xn Xn )E[s(X; a Xv Xn)] 
by the induction hypothesis 
= E[f(X)|X, = x, JE[gC)|X, = x] 


Hence, 
E[f(X)g(X)|X,] = ELPC)|X, JES) 
and, upon taking expectations of both sides, 


E[f(X)g(X)] = HEX) X JEL IX] 
> EFX) 
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The last inequality follows because E[f(X)|X,,| and E[e(X)|X„] are both increas- 
ing functions of X,,, and so, by the result for n = 1, 


FLELAX)|X, JELs(X)|Xn]] = ELELF(X)|X, ELE[s CX) [Xn] 


= El f(X)Elg(X)] o 
‘Corollary If A(x,,...,x,) is a monotone function of each of its arguments, 
then, for a set U,,..., U, of independent random numbers, 


Cov[h(U,,...,U,),h(1—U;,...,1-U,)] <0 


Proof By redefining h we can assume, without loss of generality, that h is 
increasing in its first r arguments and decreasing in its final n — r. Hence, letting 


Fees Xn) = RG i Xp 1 ~ Hay ss 1 —,) 
Elx- e s Xn) = —A(l Ee yp LL Xps hig tae Xn) 


it follows that f and g are both increasing functions. Thus, by the preceding 
theorem, 


Cov[f(U,,.--,U,),8(U,,..-,U,)]=0 
or, equivalently, 


Covi U, ds st), 
h(l —U,...,1— U, WS ox U] <0 


The result now follows since the random vector A(U,,...,U,), 
h(l—U,,...,1—U,) has the same joint distribution as does the random vector 


Hi, La en 1U), 
h(l =U... 1— Up, MI U) o 


Exercises 
1. Suppose we wanted to estimate 0, where 
1 2 
o=f dx 
0 


Show that generating a random number U and then using the estimator el (1+ 
e2U) /2 is better than generating two random numbers U, and U, and using 
[exp(U?) + exp(U?)]/2. 
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2. Explain how antithetic variables can be used in obtaining a simulation 
estimate of the quantity 


pai : 
@= f I e? dy dx 
0 0 


Is it clear in this case that using antithetic variables is more efficient than 
generating a new pair of random numbers? 


3. Let X,,i=1,...,5, be independent exponential random variables each with 
mean 1, and consider the quantity 0 defined by 


5 
6=P pas > as] 


i=l 


(a) Explain how we can use simulation to estimate 8. 
(b) Give the antithetic variable estimator. 
(c) Is the use of antithetic variables efficient in this case? 


4. Show that if X and Y have the same distribution then Var[(X + Y)/2] < 
Var(X), and conclude that the use of antithetic variables can never increase 
variance (although it need not be as efficient as generating an independent set 
of random numbers). 


5. (a) If Z is a standard normal random variable, design a study using antithetic 
variables to estimate 6 = E[Z%e7]. 
(b) Using the above, do the simulation to obtain an interval of length no 
greater than 0.1 that you can assert, with 95 percent confidence, contains 
the value of 8. 


6. Suppose that X is an exponential random variable with mean 1. Give another 
random variable that is negatively correlated with X and that is also exponential 
with mean 1. 


7. Verify Equation (8.1). 
8. Verify Equation (8.2). 


9. Let U,,n> 1, be a sequence of independent uniform (0, 1) random variables. 
Define 


S=min(n:U,+---+U,>1) 


It can be shown that S has the same distribution as does N in Example 8e, and 
so E[S] = e. In addition, if we let 


‘T=min(n:1—-U,+---+1-U,>1) 
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then it can be shown that S+T has the same distribution as does N+ M in 
Example 8e. This suggests the use of (S+ T +N -+ M)/4 to estimate e. Use 
simulation to estimate Var(N + M + S + T)/4. 


10. In certain situations a random variable X, whose mean is known, is simu- 


lated so as to obtain an estimate of P{X < a} for a given constant a. The raw 
simulation estimator from a single run is J, where 


Because J and X are clearly negative correlated, a natural attempt to reduce 
the variance is to use X as a control—and so use an estimator of the form 
I+c(X— E[X]). 


(a) Determine the percentage of variance reduction over the raw estimator 
I that is possible (by using the best c) if X were uniform on (0, 1). 

(b) Repeat (a) if X were exponential with mean 1. 

(c) Explain why we knew that J and X were negatively correlated. 


11. Show that Var(aX + (1—«@)W) is minimized by æ being equal to the value 
given in Equation (8.3) and determine the resulting variance. 


12. (a) Explain how control variables may be used to estimate @ in Exercise 1. 
(b) Do 100 simulation runs, using the control given in (a), to estimate first 
c* and then the variance of the estimator. 
(c) Using the same data as in (b), determine the variance of the antithetic 
variable estimator. 
(d) Which of the two types of variance reduction techniques worked better 
in this example? ' 


13. Repeat Exercise 12 for 9 as given in Exercise 2. 
14. Repeat Exercise 12 for 0 as given in Exercise 3. 


15. Show that in estimating 0 = E[(1—U7*)!/”] it is better to use U? rather than 
U as the control variate. To do this, use simulation to approximate the necessary 
covariances. 


16. Five elements, numbered 1, 2, 3, 4, 5, are initially arranged in a random 
order (i.e., the initial ordering is a random permutation of 1, 2, 3, 4, 5). At 
each stage one of the elements is selected and put at the front of the list. That 
is, if the present order is 2, 3, 4, 1, 5 and element 1 is chosen, then the new 
ordering is 1, 2, 3, 4, 5. Suppose that each selection is, independently, element 
i with probability p;, where p; = 4, Pa = $ Ps = $, Pa = f Ps = - Let Lj 
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denote the position of the jth element to be selected, and let L = ei L;. We 
are interested in using simulation to estimate E[L]. 


(a) Explain how we could use simulation to estimate E[L]. 

(b) Compute E[N;], where N; is the number of times element i is chosen in 
the 100 selections. 

(c) Let Y = 2, iN;. Do you think Y is positively or negatively correlated 
with L? 

(d) Develop a study to estimate L, using Y as a control variable. 

(e) Give a different approach using the idea of Example 8i, and develop a 
study to determine the efficiency of this approach. 


17. Let X and Y be independent with respective distributions F and G and with 
expected values 4, and w,. For a given value t, we are interested in estimating 
6=P{X+Y <t}. 


(a) Give the raw simulation approach to estimating 0. 

(b) Use “conditioning” to obtain an improved estimator. 

(c) Give a control variable that can be used to further improve upon the 
estimator in (b). 


18. Suppose that Y is a normal random variable with mean 1 and variance 1, 
and suppose that, conditional on Y = y, X is a normal random variable with mean 
y and variance 4. We want to use simulation to efficiently estimate 6 = P{X > 1}. 


(a) Explain the raw simulation estimator. 

(b) Show how conditional expectation can be used to obtain an sie 
estimator. 

(c) Show how the estimator of (b) can be further improved by using anti- 
thetic variables. 

(d) Show how the estimator of (b) can be further improved by using a 
control variable. 


Write a simulation program and use it to find the variances of 


(e) The raw simulation estimator. 

(f) The conditional expectation estimator. 

(g) The estimator using conditional expectation along with antithetic variables. 
(h) The estimator using conditional expectation along with a control variable. 
(i) What is the exact value of 0? 


[Hint: Recall that the sum of independent normal random variables is also 
normal. | 
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19. The number of casualty insurance claims that will be made to a branch 
office next week depends on an environmental factor U. If the value of this 
factor i r S = u, then the number of claims will have a Poisson distribution with 
mean z5 E =; Assuming that U is uniformly distributed over (0, 1), let p denote 
the probability that there will be at least 20 claims next week. 


(a) Explain how to obtain the raw simulation estimator of p. 

(b) Develop an efficient simulation estimator that uses conditional 
expectation along with a control variable. 

(c) Develop an efficient simulation estimator that uses conditional expec- 
tation and antithetic variables. 

(d) Write a program to determine the variance of the estimators in parts (a), 
(b), and (c). 


20. (The Hit-Miss Method.) Let g be a bounded function over the interval 
[0, 1]—-for example, suppose 0 < g(x) < b whenever 0 < x < 1—and suppose 
we are interested in using simulation to approximate 6 = i g(x) dx. The hit- 
miss method for accomplishing this is to generate a pair of independent random 
numbers U, and U,. Now set X = U,, Y = bU, so that the random point (X, Y) 
is uniformly distributed in a rectangle of length 1 and height b. Now set 


l= 1 if Y<g(x) 
~ 10 otherwise 


That is, J is equal to 1 if the random point (X, Y) falls within the shaded area 
of Figure 8.4. 


0, b 1,b 
g(x) 
0, 0 1,0 


Figure 8.4. The Hit-Miss Method. 


(a) Show that E[I] =[f, g(x) dx]/b. 
(b) Show that Var(bI) > Var(g(U)) and so the hit-miss estimator has a 
larger variance than simply computing g of a random number. 
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21. Let X and Y be independent exponentials with X having mean 1 and Y 
having mean 2, and suppose we want to use simulation to estimate P{X +Y > 4}. 
If you were going to use conditional expectation to reduce the variance of the 
estimator, would you condition on X or on Y? Explain your reasoning. 


22. Let X and Y be independent binomial (n, p) random variables, and let 
6 = E[e**]. 


(a) Explain the simulation approach to estimate 8. 

(b) Give a control variate and explain how to utilize it to obtain an estimator 
having a smaller variance than the raw simulation estimator in (a). 

(c) Give a different control variate which intuitively should perform better 
than the one given in (b). [Hint: Recall the series expansion of f(x) = e*.] 

(d) Use conditional expectation to improve on the raw simulation estimator. 

(e) Improve upon the estimator in (f) by using a control variate. 

(£) Estimate 0 in an efficient manner. Stop the simulation when you are at 
least 95 percent confident that your estimate is correct to within 0.1. _ 


23. Suppose that customers arrive at a single-server queueing station in accor- 
dance with a Poisson process with rate A. Upon arrival they either enter service 
if the server is free or join the queue. Upon a service completion the customer 
first in queue, if there are any customers in queue, enters service. All service 
times are independent random variables with distribution G. Suppose that the 
server is scheduled to take a break either at time T if the system is empty at 
that time or at the first moment past T that the system becomes empty. Let X 
denote the amount of time past T that the server goes on break, and suppose that 
we want to use simulation to estimate E[X]. Explain how to utilize conditional 
expectation to obtain an efficient estimator of E[X]. 

[Hint: Consider the simulation at time T regarding the remaining service time 
of the customer presently in service and the number waiting in queue. (This 
problem requires some knowledge of the theory of the M/G/1 busy period.)] 


24. Consider a single serve queue where customers arrive according to a Pois- 
son process with rate 2 per minute and the service times are exponentially 
distributed with mean 1 minute. Let T, denote the amount of time that cus- 
tomer i spends in the system. We are interested in using simulation to estimate 
0 = E[T, +--+ To]. 


(a) Do a simulation to estimate the variance of the raw simulation estimator. 
That is, estimate Var(T, +---+ To). 

(b) Do a simulation to determine the improvement over the raw estimator 
obtained by using antithetic variables. 

(c) Do a simulation to determine the improvement over the raw estimator 
obtained by using bD S; as a control variate, where S; is the ith service 
time. : 
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(d) Do a simulation to determine the improvement over the raw estimator 
obtained by using )7!°, S;— X? J; as a control variate, where J, is the 
time between the ith and (i+ 1)st arrival. 

(e) Do a simulation to determine the improvement over the raw estimator 
obtained by using the estimator 57/2, E[T;|N;], where N, is the number 
in the system when customer i arrives (and so N, = 0). 


25. Repeat Exercise 10 of Chapter 5, this time using a variance reduction 


technique as in Example 8m. Estimate the variance of the new estimator as well 
as that of the estimator that does not use variance reduction. 


26. In Example 8r, compute E[X|i] for i= 0, 1, 2. 


27. Estimate the variance of the raw simulation estimator of the expected payoff 
in the video poker model described in Example 8r. Then estimate the variance using 
the variance reduction suggested in that example. What is your estimate of the 
expected payoff? (If it is less than 1, then the game is unfair to the player.) 


28. Consider a system of 20 independent components, with component i 
being nonfunctional with probability 0.5 + i/50, i = 1,...,20. Let X denote 
the number of nonfunctional components. Use simulation to efficiently estimate 
P{X <5}. 


29. Estimate P{X = 5|X < 5} in the preceding exercise. 
30. If X is such that P{0 < X < a} = 1, show that 
(a) E[X?] < aE[X]. 


(b) Var(X) < E[X](a — E[X]). 
(c) Var(X) < a?/4. 


_ [Hint: Recall that maxo<,<; P(1— p) = }.] 


31. In Example 8x, give an analytic upper bound on P{S > 62}. 
32. Use simulation to estimate E[S|S > 200] in Example 8x. 


33. Suppose we have a “black box” which on command can generate the value 
of a gamma random variable with parameters 3 and 1. Explain how we can 
use this black box to approximate E[e*/(X+1)], where X is an exponential 
random variable with mean 1. 


34. Suppose in Exercise 13 of Chapter 6 that we are interested in using simu- 
lation to estimate p, the probability that the system fails by some fixed time t. 
If p is very small, explain how we could use importance sampling to obtain a 
more efficient estimator than the raw simulation one. Choose some values for 
a, C, and t that make p small, and do a simulation to estimate the variance of 
an importance sampling estimator as well as the raw simulation estimator of p. 
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35. Consider two different approaches for manufacturing a product. The profit 
from these approaches depends on the value of a parameter a, and let v;(æ) 
denote the profit of approach i as a function of a. Suppose that approach 1 
works best for small values of a in that v,(q@) is a decreasing function of a, 
whereas approach 2 works best for large values of æ in that v, (æ) is an increasing 
function of a. If the daily value of œ is a random variable coming from the 
distribution F, then in comparing the average profit of these two approaches, 
should we generate a single value of œ and compute the profits for this œ, or 
should we generate @, and œ, and then compute v,(a;), i= 1,2? 


36. Consider a list of n names, where n is very large, and suppose that a given 
name may appear many times on the list. Let N(i) denote the number of times 
the name in position i appears on the list, i=1,...,n, and let @ denote the 
number of distinct names on the list. We are interested in using simulation to 
estimate 0. 


(a) Argue that 0 = Eii ay. 
Let X be equally likely to be 1, ...,. Determine the name in position 
X and go through the list starting from the beginning, stopping when 
you reach that name. Let Y = 1 if the name is first reached at position 
X and let Y = 0 otherwise. (That is, Y = 1 if the first appearance of the 
name is at position X.) 

(b) Argue that E[Y|N(X)] = 

(c) Argue that E[nY] = 0. 

(d) Now, let W = 1 if position X is the last time that the name in that 
position appears on the list, and let it be 0 otherwise. (That is, W = 1 if 
going from the back to the front of the list, the name is first reached at 
position X.) Argue that n(W + Y)/2 is an unbiased estimator of 8. 

(e) Argue that if every name on the list appears at least twice, then the 
estimator in (d) is a better estimator of 6 than is (n¥,+Y,)/2 where 
Y, and Y, are independent and distributed as is F. 

(f) Argue that n/(N(X)) has smaller variance than the estimator in (e), 
although the estimator in (e) may still be more efficient when replication 
is very high because its search process is quicker. 


zi 
N(x)? 


37. Let ®~'(x) be the inverse function of the standard normal distribution 
function ®(x). Assuming that you can efficiently compute both (x) and 
®~'(x), show that you can generate a standard normal random variable Z that 
is conditioned to exceed c by generating a random number U, letting Y = 
U+(1—U)®(c), and setting 


Z=07 (9) 


[Excel has built-in programs to compute both ®(x) and ®~!(x).] 
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38. For the compound random vector estimator € of Section 8.5, show that 
Var(E) < Var(gy(X1,---»Xw)) 


Hint: Show that £ is a conditional expectation estimator. 


39. Suppose we want to use simulation to determine @ = E[h(Z,,...,2Z,)] 
where Z,,..., Z,, are independent standard normal random variables, and where 
h is an increasing function of each of its coordinates. Let W = J; 4;Z;, where 
all the a; are nonnegative. Using the following lemma, explain how we can use 
stratified sampling, stratifying on W, to approximate 0. Assume that the inverse 
transform method will be used to simulate W. 

Lemma. If the standard normal random variable Z is independent of X, a 
normal random variable with mean u and variance o°, then the conditional 


distribution of Z given that Z + X = t is normal with mean a and variance 
o? 


i+o?° 


40. Explain how the approach of the preceding problem can be used when 
h(x,,.-+,X,) is an increasing function of some of its variables, and a decreasing 
function of the others. 
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Statistical Validation 
Techniques 


Introduction 


In this chapter we consider some statistical procedures that are useful in validating 
simulation models. Sections 9.1 and 9.2 consider goodness of fit tests, which are 
useful in ascertaining whether an assumed probability distribution is consistent 
with a given set of data. In Section 9.1 we suppose that the assumed distribution 
is totally specified, whereas in Section 9.2 we suppose that it is only specified up 
to certain parameters—for example, it may be Poisson having an unknown mean. 
In Section 9.3 we show how one can test the hypothesis that two separate samples 
of data come from the same underlying population—as would be the case with 
real and simulated data when the assumed mathematical model being simulated 

is an accurate representation of reality. The results of Section 9.3 are particularly 
` useful in testing the validity of a simulation model. A generalization to the case 
of many samples is also presented in this section. Finally, in Section 9.4, we 
show how to use real data to test the hypothesis that the process generating the 
data constitutes a nonhomogeneous Poisson process. The case of a homogeneous 
Poisson process is also considered in this section. 


9.1 Goodness of Fit Tests 


One often begins a probabilistic analysis of a given phenomenon by hypothesiz- 
ing that certain of its random elements have a particular probability distribution. 
For example, we might begin an analysis of a traffic network by supposing that 
the daily number of accidents has a Poisson distribution. Such hypotheses can be 
statistically tested by observing data and then seeing whether the assumption of 
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a particular probability distribution is consistent with these data. These statistical 
tests are called goodness of fit tests. 

One way of performing a goodness of fit test is to first partition the possible 
values of a random quantity into a finite number of regions. A sample of values 
of this quantity is then observed and a comparison is made between the numbers 
of them that fall into each of the regions and the theoretical expected numbers 
when the specified probability distribution is indeed governing the data. 

In this section we consider goodness of fit tests when all the parameters of 
the hypothesized distribution are specified; in the following section we consider 
such tests when certain of the parameters are unspecified. We first consider the 
case of a discrete and then a continuous hypothesized distribution. 


The Chi-Square Goodness of Fit Test for Discrete Data 


Suppose that n independent random variables—Y,,... , ¥,—each taking on one 
of the values 1, 2, . . . , k, are to be observed, and that we are interested in testing 
the hypothesis that {p;,i=1,...,k} is the probability mass function of these 
random variables. That is, if Y represents any of the Y;, the hypothesis to be 
tested, which we denote by H, and refer to as the null hypothesis, is 


Hy: P{Y=i}=p, i=1,...,k 


To test the foregoing hypothesis, let N,,i=1,...,k, denote the number of 
the Y,’s that equal i. Because each Y, independently equals i with probability 
P{Y = i}, it follows that, under Ho, N; is binomial with parameters n and Pi- 
Hence, when H is true, 


E[N] = np; 


and so (N,;—np;)? is an indication as to how likely it appears that p; indeed 
equals the probability that Y = i. When this is large, say, in relation to nD;, 
then it is an indication that H; is not correct. Indeed, such reasoning leads us to 
consider the quantity 


: (N; — np;)? 
r=5 Are 
i=1 Pi 


and to reject the null hypothesis when T is large. 

Whereas small values of the test quantity T are evidence in favor of the 
hypothesis Hy, large ones are indicative of its falsity. Suppose now that the 
actual data result in the test quantity T taking on the value t. To see how unlikely 
such a large outcome would have been if the null hypothesis had been true, we 
define the so-called p-value by 


p-value = Py, {T > t} 
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where we have used the notation Py, to indicate that the probability is to be 
computed under the assumption that Hp is correct. Hence, the p-value gives the 
probability that a value of T as large as the one observed would have occurred if 
the null hypothesis were true. It is typical to reject the null hypothesis—saying 
that it appears to be inconsistent with the data—when a small p-value results 
(a value less than 0.05, or more conservatively, 0.01 is usually taken to be 


- critical) and to accept the null hypothesis—saying that it appears to be consistent 


with the data—otherwise. 
After observing the value—call it t—of the test quantity, it thus remains to 

determine the probability 

p-value = Pa {T > t} 
A reasonably good approximation to this probability can be obtained by using 
the classical result that, for large values of n, T has approximately a chi-square 
distribution with k—1 degrees of freedom when H is true. Hence, 

p-value + P {X} >t} (9.1) 


where X7_, is a chi-square random variable with k—1 degrees of freedom. 


Example 9a Consider a random quantity which can take on any of the 
possible values 1, 2, 3, 4, 5, and suppose we want to test the hypothesis that 
these values are equally likely to occur. That is, we want to test 


Ay: p,;=0.2, i=1,...,5 
If a sample of size 50 yielded the following values of N;: 
12,5, 19, 7,7 


then the approximate p-value is obtained as follows. The value of the test statistic 
T is given by 


44254814949 _ 


12.8 
10 


T 
This yields 
p-value © P {X} > 12.8} = 0.0122 


For such a low p-value the hypothesis that all outcomes are equally likely would 
be rejected. o 
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If the p-value approximation given by Equation (9.1) is not too small—say, 
of the order of 0.15 or larger—then it is clear that the null hypothesis is not 
going to be rejected, and so there is no need to look for a better approximation. 
However, when the p-value is closer to a critical value (such as 0.05 or 0.01) we 
would probably want a more accurate estimate of its value than the one given by 
the chi-square approximate distribution. Fortunately, a more accurate estimator 
can be obtained via a simulation study. 

The simulation approach to estimating the p-value of the outcome T = t is as 
follows. To determine the probability that T would have been at least as large as 
t when Hy is true, we generate n independent random variables Y®,..., ¥, 
each having the probability mass function {p;,i=1,...,k}—that is, 


PYM =i}=p, i=1,...,k fal...sn 
Now let 
N® = number j: =i 
and set 


: 1 
TO = > (Ni l —np;) 


i=l np; 


Now repeat this procedure by simulating a second set, independent of the first 
set, of n independent random variables ¥, ... , ¥ each having the probability 
mass function {p;,i=1,...,k} and then, as for the first set, determining T®. 
Repeating this a large number of times, say r, yields r independent random 
variables T®, T@,...,7, each of which has the same distribution as does 
the test statistic T when Ho is true. Hence, by the law of large numbers, the 
proportion of the 7; that are as large as t will be very nearly equal to the 
probability that T is as large as t when Hp is true—that is, 


number 1: TO >t 
=— a x Py {T = t} 


The Kolmogorov-Smirnov Test for Continuous Data 


Now consider the situation where Y,,..., Y, are independent random variables, 
and we are interested in testing the null hypothesis H that they have the common 
distribution function F, where F is a given continuous distribution function. One 
approach to testing Hp is to break up the set of possible values of the Y, into k 
distinct intervals, say, 


(Yo: Yi)» is Y2)» - +++ Yeas Ye)» where Yo = —%, Y; = +00 
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and then consider the discretized random variables yi „j=1,...,n, defined by 
Y =i if Y, lies in the interval (y,_,, y;) 

The null hypothesis then implies that 
P {Yf =i} =F) FO) i=1,...,k 


and this can be tested by the chi-square goodness of fit test already presented. 

There is, however, another way of testing that the Y, come from the continuous 
distribution function F which is generally more efficient than discretizing; it 
works as follows. After observing Y,,..., Y,,, let F, be the empirical distribution 
function defined by 


#i:Y,<x 
FQ) = =* 


That is, F,(x) is the proportion of the observed values that are less than or equal 
to x. Because F,(x) is a natural estimator of the probability that an observation 
is less than or equal to x, it follows that, if the null hypothesis that F is the 
underlying distribution is correct, it should be close to F(x). Since this is so for 
all x, a natural quantity on which to base a test of Hy is the test quantity 


D=Maximum|F,(x) — F(x)| 
where the maximum is over all values of x from —co to +00. The quantity D is 


called the Kolmogorov—Smirnov test statistic. 
To compute the value of D for a given data set Y, = y, j =1,...,n, let 


Yay» Yo - -+ > Ym denote the values of the y; in increasing order. That is, 


Yg = jth smallest of y,,..-,y, 


For example, if n = 3 and y, = 3, y, = 5, y; = 1, then yq) = 1, YQ) =3, Yg) =5. 
Since F,(x) can be written 


0 ifx< yy 
if Ym SX < Yo) 


aj- 


F(x) = . 
2 a Yy SX < Yg 


1 if Ym <x 
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rie 


Jay Yay Ye) HW) 5) 


Figure 9.1. n=5. 


we see that F(x) is constant within the intervals (y;_,), Yg) and then jumps 
by 1/n at the points Yap- +++ Yn): Since F(x) is an increasing function of x 
which is bounded by 1, it follows that the maximum value of F,(x) — F(x) is 
nonnegative and occurs at one of the points yy, 7=1,...,m (see Figure 9. 1). 
That is, 


Maximum{F, (x) — F(x)} = Maximum {2 — ‘ad (9.2) 


peas 


Similarly, the maximum value of F(x) — F,({x) is also nonnegative and occurs 
immediately before one of the jump points y), and so 


Maximum {F(x) — —F,(x)} = Maximum {Fou) matic 2) (9.3) 


PERR 


From Equations (9.2) and (9.3) we see that 
D = Maximum|F,(x) — F(x)| 


= Maximum {Maximum{F,,(x) — os Maximum { F(x) — F,(x)}} 


= Maximum {2 


=e aksel (9.4) 


Equation (9.4) can be used to compute the value of D. 

Suppose now that the Y, are observed and their values are such that D = d. 
Since a large value of D would appear to be inconsistent with the null hypothesis 
that F is the underlying distribution, it follows that the p-value for this data set 
is given by 


p-value = P,{D > d} 
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where we have written Pp to make explicit that this probability is to be computed 
under the assumption that Hy is correct (and so F is the underlying distribution). 

The above p-value can be approximated by a simulation that is made easier by 
the following proposition, which shows that P;{D > d} does not depend on the 
underlying distribution F. This result enables us to estimate the p-value by doing 
the simulation with any continuous distribution F we choose [thus allowing us 
to use the uniform (0,1) distribution]. 


Proposition P,{D > d} is the same for any continuous distribution F. 


Proof 


#i: Y, 


P,{D > d} =P, {Maximum et -F)| > a| 


<p {Maximum | aa oE Fol > a} 
= P Maximum He USI) _ ay > a} 


where U,,...,U, are independent uniform (0, 1) random variables. The first 
equality follows because F is an increasing function and so Y < x is equivalent 
to F(Y) < F(x), and the second because of the result (whose proof is left as an 
exercise) that if Y has the continuous distribution F then the random variable 
F(Y) is uniform on (0, 1). 

Continuing the above, we see, by letting y = F(x) and noting that as x ranges 
from —oo to +00, F(x) ranges from 0 to 1, that 

ajea 


which shows that the distribution of D, when H, is true, does not depend on the 
actual distribution F. Oo 


#i: U; < 
P,{D > d} = P Maximum steal ees 


<ysl 


It follows from the preceding proposition that after the value of D is determined 
from the data, say, D = d, the p-value can be obtained by doing a simulation 
with the uniform (0, 1) distribution. That is, we generate a set of n random 


numbers U,,..., U, and then check whether or not the inequality 
#i: U< 
Maximum |" =* y| > d (9.5) 
O<y<l n 


is valid. This is then repeated many times and the proportion of times that it is 
valid is our estimate of the p-value of the data set. As noted earlier, the left side 


226 9 Statistical Validation Techniques 


of the inequality (9.5) can be computed by ordering the random numbers and 
then using the identity 


#i: U, Sy 
—— ey 


Max 
n 


= Max į = — Uia, Ura — T a rere 
| {2 dG)? “O n J n} 


where Up is the jth smallest value of U,,...,U,. For example, if n =3 and 
U, = 0.7, U, = 0.6, U; = 0.4, then Un = 0.4, Uo) = 0.6, Us) = 0.7 and the 
value of D for this data set is 


1 
D = Max į > — 0.4, zaie 1 — 0.7, 0.4, 0.6 — e ee =0.4 
3 3 3 3 


Example 9b Suppose we want to test the hypothesis that a given population 
distribution is exponential with mean 100; that is, F(x) = 1—e7*/!™. If the 
(ordered) values from a sample of size 10 from this distribution are 


66, 72, 81, 94, 112, 116, 124, 140, 145, 155 


what conclusion can be drawn? 

To answer the above, we first employ Equation (9.4) to compute the value of 
the Kolmogorov—Smirnov test quantity D. After some computation this gives the 
result D = 0.4831487. To obtain the approximate p-value we did a simulation 
which gave the following output: 


RUN 

THIS PROGRAM USES SIMULATION TO APPROXIMATE THE p-value 
OF THE KOLMOGOROV-SMIRNOV TEST 

Random number seed (—32768 to 32767) ? 4567 

ENTER THE VALUE OF THE TEST QUANTITY 

? 0.4831487 

ENTER THE SAMPLE SIZE 

? 10 

ENTER THE DESIRED NUMBER OF SIMULATION RUNS 

? 500 

THE APPROXIMATE p-value IS 0.012 

OK 


Because the p-value is so low (it is extremely unlikely that the smallest of a 
set of 10 values from the exponential distribution with mean 100 would be as 
large as 66), the hypothesis would be rejected. o 
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9.2 Goodness of Fit Tests When Some Parameters 
Are Unspecified 


The Discrete Data Case 


We can also perform a goodness of fit test of a null hypothesis that does not 
completely specify the probabilities {p,,i=1,...,k}. For example, suppose 
we are interested in testing whether the daily number of traffic accidents in a 
certain region has a Poisson distribution with some unspecified mean. To test 
this hypothesis, suppose that data are obtained over n days and let Y, represent 
the number of accidents on day i, fori=1,...,n. To determine whether these 
data are consistent with the assumption of an underlying Poisson distribution, 
we must first address the difficulty that, if the Poisson assumption is correct, 
these data can assume an infinite number of possible values. However, this is 
accomplished by breaking up the set of possible values into a finite number of, 
say, k regions and then seeing in which of the regions the n data points lie. For 
instance, if the geographical area of interest is small, and so there are not too 
many accidents in a day, we might say that the number of accidents in a given 
day falls in region i, i= 1, 2, 3, 4, 5, when there are i— 1 accidents on that day, 
and in region 6 when there are 5 or more accidents. Hence, if the underlying 
distribution is indeed Poisson with mean A, then 


eò Àit 
BS Gea i=1,2,3,4,5 (9.6) 
4 eA 
Pe=1-)> t 
jo J 


Another difficulty we face in obtaining a goodness of fit test of the hypothesis that 


“the underlying distribution is Poisson is that the mean value A is not specified. 


Now, the intuitive thing to do when A is unspecified is clearly to estimate its 
value from the data—call A the estimate—and then compute the value of the 
test statistic 
T= > (N; ean) 
i=1 nD; 


where N; is the number of the Y, that fall in region i, and where Ê; is the 
estimated probability, under Hp, that Y; falls in region i,i=1,...,k, which is 
obtained by substituting À for A in the expression (9.6). 

The above approach can be used whenever there are unspecified parameters 
in the null hypothesis that are needed to compute the quantities p,,i=1,...,k. 
Suppose now that there are m such unspecified parameters. It can be proved that, 


| 
| 
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for reasonable estimators of these parameters, when n is large the test quantity 
T has, when H, is true, approximately a chi-square distribution with k—1—m 
degrees of freedom. (In other words, one degree of freedom is lost for each 
parameter that needs to be estimated.) 

If the test quantity takes on the value, say, T = t, then, using the above, the 
p-value can be approximated by 


p-value © P {X} i-m = t} 
where X?_,_,, is a chi-square random variable with k — 1 — m degrees of freedom. 


Example 9c Suppose that over a 30-day period there are 6 days in which 
no accidents occurred, 2 in which 1 accident occurred, 1 in which 2 accidents 
occurred, 9 in which 3 occurred, 7 in which 4 occurred, 4 in which 5 occurred, 
and 1 in which 8 occurred. To test whether these data are consistent with the 
hypothesis of an underlying Poisson distribution, note first that since there were 
a total of 87 accidents, the estimate of the mean of the Poisson distribution is 


Since the estimate of P{Y = i} is thus e~?°(2.9)'/i!, we obtain that with the six 
regions as given at the beginning of this section 

p,= 0.0500,  ĝ, =0.1596,  p, = 0.2312, 

y= 0.2237, P, =0.1622, P= 0.1682 


Using the data values N, = 6, M, =2, N,=1, N,=9, N; =7, N; =5, we see 
that the value of the test statistic is 


ae 57 = 0p (N; — —— 


i=] i 


= 19.887 


To determine the p-value we run Program 9-1, which yields 
p-value ~ P {X} > 19.887} = 0.0005 
and so the hypothesis of an underlying Poisson distribution is rejected. o 
We can also use simulation to estimate the p-value. However, since the null 
hypothesis no longer completely specifies the probability model, the use of 


simulation to determine the p-value of the test statistic is somewhat trickier than 
before. The way it should be done is as follows. 
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(a) The Model. Suppose that the null hypothesis is that the data values 
Y,,..., Y, constitute a random sample from a distribution that is specified 
up to a set of unknown parameters @,,..., Ôn. Suppose also that when this 
hypothesis is true, the possible values of the Y, are 1,...,k. 

(b) The Initial Step. Use the data to estimate the unknown parameters. Specif- 
ically, let 6; denote the value of the estimator of 6;, j=1,...,m. Now 
compute the value of the test statistic 


T= y Mani) —np;) 


i=] a I 
where N; is the number of the data values that are equal to i,i=1,...,k, 
and p; is the estimate of p; that results when 6; is substituted for 6;, for 
j=1,...,m. Let ¢ denote the value of the test quantity T. 


(c) The Simulation Step. We now do a series of simulations to estimate the 
p-value of the data. First note that all simulations are to be obtained by 
using the population distribution that results when the null hypothesis is 
true and 6; is equal to its estimate 6;, j=1,...,m, determined in step (b). 

Simulate a sample of size n from the aforementioned population distri- 
bution and let 6; (sim) denote the estimate of 8;, j = 1, ..., m, based on 
the simulated data. Now determine the value of 


3 [Mnp 


T 
a np,(sim) 


sim ~~ 


where N; is the number of the simulated data values equal toi,i=1,...,k, 
and p; (sim) is the value of p; when 6; is equal to 6,(sim), j=1,...,m. 


The simulation step should then be repeated many times. The estimate of the 
p-value is then equal to the proportion of the values of Toim that are at least as 


- large as ż. 


Example 9d Let us reconsider Example 9c. The data presented in this 
example resulted in the estimate Â =2.9 and the test quantity value T = 19.887. 
The simulation step now consists of generating 30 independent Poisson random 
variables each having mean 2.9 and then computing the value of 


6 *)2 
T= Z (X; 307i) 
i=l 30; 
where X; is the number of the 30 values that fall into region i, and př is the 
probability that a Poisson random variable with a mean equal to the average of 
the 30 generated values would fall into region i. This simulation step should 
be repeated many times, and the estimated p-value is the proportion of times it 
results in a T* at least as large as 19.887. oO 
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The Continuous Data Case 


Now consider the situation where we want to test the hypothesis that the random 
variables Y,,...,¥, have the continuous distribution function Fj, where 6 = 
(0i -- -> Om) is a vector of unknown parameters. For example, we might be 
interested in testing that the Y, come from a normally distributed population. 
To employ the Kolmogorov-Smirnov test we first use the data to estimate the 
parameter vector 6, say, by the vector of estimators @. The value of the test 
Statistic D is now computed by 


D = Maximum|F,(x) — Fax! 
x 


where F} is the distribution function obtained from Fy when @ is estimated by 0. 

If the value of the test quantity is D = d, then the p-value can be roughly 
approximated by Pr, {D > d} = Py{D > d}. That is, after determining the value of 
D, a rough approximation, which actually overestimates the p-value, is obtained. 
Tf this does not result in a small estimate for the p-value, then, as the hypothesis 
is not going to be rejected, we might as well stop. However, if this estimated 
p-value is small, then a more accurate way of using simulation to estimate the 
true p-value is necessary. We now describe how this should be done. 


STEP 1: Use the data to estimate 0, say, by ô. Compute the value of D as 
described above. 

STEP 2: All simulations are to be done using the distribution F}. Generate 
a sample of size n from this distribution and let @ (sim) be the 
estimate of @ based on this simulation run. Compute the value of 


Maximum| F, esim) — F; ê(sim) (x)| 


where F, sim is the empirical distribution function of the simulated 
data; and note whether it is at least as large as d. Repeat this many 
times and use the proportion of times that this test quantity is at 
least as large as d as the estimate of the p-value. 


9.3 The Two-Sample Problem 


Suppose we have formulated a mathematical model for a service system which 
clears all its customers at the end of a day; moreover, suppose that our model 
assumes that each day is probabilistically alike in that the probability laws for suc- 
cessive days are identical and independent. Some of the individual assumptions 
of the model—such as, for example, that the service times are all independent 
with the common distribution G, or that the arrivals of customers constitute a 
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Poisson process—can be individually tested by using the results of Sections 9.1 
and 9.2. Suppose that none of these individual tests results in a particularly 
small p-value and so all the parts of the model, taken individually, do not appear 
to be inconsistent with the real data we have about the system. [We must be 
careful here in what we mean by a small p-value because, even if the model 
is correct, if we perform a large number of tests then, by chance, some of the 


- resulting p-values may be small. For example, if we perform r separate tests on 


independent data, then the probability that at least one of the resulting p-values 
is as small as æ is 1 — (1 — a)’, which even for small œ will become large as r 
increases. | 

At this stage, however, we are still not justified in asserting that our model 
is correct and has been validated by the real data; for the totality of the model, 
including not only all the individual parts but also our assumptions about the 
ways in which these parts interact, may still be inaccurate. One way of testing the 
model in its entirety is to consider some random quantity that is a complicated 
function of the entire model. For example, we could consider the total amount of 
waiting time of all customers that enter the system on a given day. Suppose that 


we have observed the real system for m days and let Y,,i=1,...,m, denote 
the sum of these waiting times for day i. If we now simulate the proposed 
mathematical model for n days, we can let X;,i=1,...,n, be the sum of 


the waiting times of all customers arriving on the (simulated) day i. Since 
the mathematical model supposes that all days are probabilistically alike and 
independent, it follows that all the random variables X,,...,X,, have some 
common distribution, which we denote by F. Now if the mathematical model 
is an accurate representation of the real system, then the real data Y,,..., Y, 
also have the distribution F. That is, if the mathematical model is accurate, one 
should not be able to tell the simulated data apart from the real data. From this 
it follows that one way of testing the accuracy of the model in its entirety is to 
test the null hypothesis Hy that X,,...,X,,Y;,..., Y„ are independent random 
variables having a common distribution. We now show how such a hypothesis 


` can be tested. 


Suppose we have two sets of data—X,,..., X, and Y,,..., Y,,—-and we want 
to test the hypothesis Hy that these n+ m random variables are all independent 
and identically distributed. This statistical hypothesis testing problem is called 
the two-sample problem. 

To test Ho, order the n+ m values X,,...,X,,¥,,..., Y,, and suppose for 
the time being that all n+ m values are distinct and so the ordering is unique. 
Now fori=1,...,n, let R; denote the rank of X; among the n+ m data values; 


that is, R; = j if X; is the jth smallest among the n+ m values. The quantity 


equal to the sum of the ranks of the first data set, is used as our test quantity. 
(Either of the two data sets can be considered as the “first” set.) 
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If R is either very large (indicating that the first data set tends to be larger 
than the second) or very small (indicating the reverse), then this would be strong 
evidence against the null hypothesis. Specifically, if R = r, we reject the null 
hypothesis if either 


Py {R <r} or Py {R= r} 
is very low. Indeed, the p-value of the test data which results in R = r is given by 
p-value = 2 Minimum(Py, {R < r}, Pu {R = r}) (9.7) 


[It is twice the minimum of the probabilities because we reject either if R is too 
small or too large. For example, suppose r+ and r* were such that the probability, 
under A, of obtaining a value less (greater) than or equal to r*(r*) is 0.05. 
Since the probability of either event occurring is, under Ho, 0.1 it follows that if 
the outcome is r (or r*) the p-value is 0.1.] 

The hypothesis test resulting from the above p-value—that is, the test that 
calls for rejection of the null hypothesis when the p-value is sufficiently small— 
is called the two-sample rank sum test. (Other names that have also been used 
to designate this test are the Wilcoxon two-sample test and the Mann—Whitney 
two-sample test.) 


Example 9e Suppose that direct observation of a system over 5 days has 
yielded that a certain quantity has taken on the successive values 


342, 448, 504, 361, 453 


whereas a 10-day simulation of a mathematical model proposed for the system 
has resulted in the following values: 


186, 220, 225, 456, 276, 199, 371, 426, 242, 311 


Because the five data values from the first set have ranks 8, 12, 15, 9, 13, it 
follows that the value of the test quantity is R = 57. Qo 


We can explicitly compute the p-value given in Equation (9.7) when n and m 
are not too large and all the data are distinct. To do so let 


Pom (T) = Py {R <r} 


Hence P,,,,(r) is the probability that from two identically distributed data sets 
of sizes n and m, the sum of the ranks of the data values from the first set is less 
than or equal to r. We can obtain a recursive equation for these probabilities 
by conditioning on whether the largest data value comes from the first or the 
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second set. If the largest value is indeed contained in the first data set, the sum 
of the ranks of this set equals n+ m (the rank of the largest value) plus the sum 
of the ranks of the other n — 1 values from this set when considered along with 
the m values from the other set. Hence, when the largest is contained in the first 
data set, the sum of the ranks of that set is less than or equal to r if the sum of 
the ranks of the remaining n — 1 elements is less than or equal to r —n — m, and 


- this is true with probability P,_, ,,(r —n — m). By a similar argument we can 


show that if the largest value is contained in the second set, the sum of the ranks 
of the first set is less than or equal to r with probability P, mı (r). Finally, since 
the largest value is equally likely to be any of the n-+m values, it follows that 
it is a member of the first set with probability n/(n + m). Putting this together 
yields the following recursive equation: 


n m 
Pua) = ngm ral —n— m) + nam rm- (r) (9.8) 


Starting with the boundary conditions 


0, k<0 
1, k>0 


0, k<0 


Patt) | 1, k>0 


and = Py, (k) = | 


Equation (9.8) can be recursively solved to obtain P,,,,(r) = Py AR < r} and 
Panl ~1)=1— Pp {R 2 r} 


Example 9f Five days of observation of a system yielded the following 
values of a certain quantity of interest: 


132, 104, 162, 171, 129 


. A 10-day simulation of a proposed model of this system yielded the values 


107, 94, 136, 99, 114, 122, 108, 130, 106, 88 
Suppose the formulated model implies that these daily values should be inde- 
pendent and have a common distribution. To determine the p-value that results 
from the above data, note first that R, the sum of the ranks of the first sample, is 
R=12+4+14+15+10=55 
A program using the recursion (9.8) yielded the following output: 
THIS PROGRAM COMPUTES THE p-value FOR THE TWO-SAMPLE RANK 


SUM TEST 
THIS PROGRAM WILL RUN FASTEST IF YOU DESIGNATE AS THE FIRST 
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SAMPLE THE SAMPLE HAVING THE SMALLER SUM OF RANKS 
ENTER THE SIZE OF THE FIRST SAMPLE 
25 
ENTER THE SIZE OF THE SECOND SAMPLE 
? 10 
ENTER THE SUM OF THE RANKS OF THE FIRST SAMPLE 
? 55 
The p-value IS 0.0752579 
OK oO 


The difficulty with employing the recursion (9.8) to compute the p-value is 
that the amount of computation needed grows enormously as the sample sizes 
increase. For example, if n = m = 20, even if we choose the test quantity to be the 
smaller sum of ranks, then since the sum of all the ranks is 1+2+---+40 = 820, 
it is possible that the test statistic could have a value as large as 410. Hence, there 
can be as many as 20 x 20 x 410 = 164,000 values of P, ,,(r) that would have 
to be computed to determine the p-value. Thus, for large samples, the use of 
the recursion provided by (9.8) may not be viable. Two different approximation 
methods that can be used in such cases are (a) a classical approach based on 
approximating the distribution of R and (b) simulation. 

To use the classical approach for approximating the p-value we make use of 
the fact that under H, all possible orderings of the n-+m values are equally 
likely. Using this fact it is easy to show that 


n+m-+1) 
Ex [R] = pene) 
Vary, (R) = nm int) 


Now it can be shown that, under Hy, when n and m are large, R is approximately 
normally distributed. Hence, when H, is true, 


R— 1)/2 
Ro-atatmt)/2 is approximately a standard normal. 
nm(n+m-+1)/12 


Because for a normal random variable W, the minimum of P{W < r} and 
P{W > r} is the former when r < E[W], and the latter otherwise, it follows that 
when n and m are not too small (both being greater than 7 should suffice), we 
can approximate the p-value of the test result R = r by 


2P{Z<r*} ifr<n 
2 P{Z>r*} otherwise 


(n+m-+1) 
2 


p-value ~% (9.9) 
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where 


i n(n+m+1) 
+ 2 
Jum(n+m-+1) 
12 
and where Z is a standard normal random variable. 


Example 9g Let us see how well the classical approximation works for the 
data of Example 9g. In this case, since n = 5 and m = 10, we have that 


p-value = 2 Py, {R > 55} 


55 — 40 


i 
[50 x 16 
12 


=2 P{Z > 1.8371} 


x2 P 


= 0.066 
which should be compared with the èxact answer 0.075. o 


The p-value of the two-sample rank test can also be approximated by simu- 
lation. To see how this is accomplished, recall that if the observed value of the 
test quantity R is R =r, then the p-value is given by 


p-value = 2 Minimum (Py {R > r}, Py {R < r} 


Now, under H), provided that all the n+ m data values are distinct, it follows 
that all orderings among these data values are equally likely, and thus the ranks 
of the first data set of size n have the same distribution as a random selection of 
n of the values 1,2,...,n-+m. Thus, under Ho, the probability distribution of 
R can be approximated by continually simulating a random subset of n of the 
integers 1,2,...,-+7 and determining the sum of the elements in the subset. 
The value of Pp, {R < r} can be approximated by the proportion of simulations 
that result in a sum less than or equal to r, and the value of P} {R = r} by the 
proportion of simulations that result in a sum greater than or equal to r. 

The above analysis supposes that all the n+ m data values are distinct. When 
certain of the values have a common value, one should take as the rank of a 
datum value the average of the ranks of the values equal to it. For example, if 
the first data set is 2, 3, 4 and the second 3, 5, 7, then the sum of the ranks of 
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the first set is 1+-2.5-++-4 = 7.5. The p-value should be approximated by using 
the normal approximation via Equation (9.9). 

A generalization of the two-sample problem is the multisample problem, where 
one has the following m data sets: 


Xii Xiz ...3 Xin 
Xai X05 ...3 Xam 
Xm Xmar eres Xma, 


and we are interested in testing the null hypothesis Họ that all the n = X z; n; 
random variables are independent and have a common distribution. A general- 
ization of the two-sample rank test, called the multisample rank test (or often 
referred to as the Kruskal-Wallis test), is obtained by first ranking all the n data 
values. Then let R;,i=1,...,m, denote the sum of the ranks of all the n; data 
values from the ith set. (Note that with this notation R; is a sum of ranks and 
not an individual rank as previously.) Since, under Hp, all orderings are equally 
likely (provided all the data values are distinct), it follows exactly as before that 


(n+1) 


ER] =n, 
[Ri] hi 2 


Using the above, the multisample rank sum test is based on the test quantity 


12 z [R;—n;(n+1)/2} 


2, 


7 n(n+1) a ni 


Since small values of R indicate a good fit to Hp, the test based on the quantity 
R rejects H) for sufficiently large values of R. Indeed, if the observed value of 
R is R = y, the p-value of this result is given by 


p-value = Pu {R > y} 


This value can be approximated by using the result that for large values of 
Ny,--+5M,_, R bas approximately a chi-square distribution with m — 1 degrees of 
freedom [this latter result being the reason why we include the term 12/n(n+-1) 
in the definition of R]. Hence, if R = y, 


p-value ~ P {X} > y} 


Simulation can also be used to evaluate the p-value (see Exercise 14). 
Even when the data values are not all distinct, the above approximation for the 
p-value should be used. In computing the value of R the rank of an individual 
danta value should be, as before, the average of all the ranks of the data equal 
to it. 
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9.4 Validating the Assumption of a Nonhomogeneous 
Poisson Process 


Consider a mathematical model which supposes that the daily arrivals to a 
system occur in accordance with a nonhomogeneous Poisson process, with the 
arrival process from day to day being independent and having a common, but 
unspecified, intensity function. 

To validate such an assumption, suppose that we observe the system over r 
days, noting the arrival times. Let N,,i=1,..., r, denote the number of arrivals 
on day i, and note that if the arrival process is indeed a nonhomogeneous Poisson 
process, then these quantities are independent Poisson random variables with 
the same mean. Now whereas this consequence could be tested by using the 
goodness of fit approach, as is done in Example 9a, we present an alternative 
approach that is sometimes more efficient. This alternative approach is based 
on the fact that the mean and variance of a Poisson random variable are equal. 
Hence, if the N, are indeed a sample from a Poisson distribution, the sample 
mean 


and the sample variance 


N,—N)? 
r-l 


( 
eae 
j=l 

should be roughly equal. Motivated by this, we base our test of the hypothesis 
Hy : N; are independent Poisson random variables with a common mean on 
the test quantity 


52 
T == 9.10 
n. (9.10) 
Because either a very small or very large value of T would be inconsistent with 
Ho, the p-value for the outcome T = t would be 


p-value = 2 Minimum (Pp {T < t}, Pa {T > t} 


However, since Hy does not specify the mean of the Poisson distribution, we 
cannot immediately compute the above probabilities; rather, we must first use 
the observed data to estimate the mean. By using the estimator Ñ, it follows that 
if the observed value of N is N = m, the p-value can be approximated by 


p-value ~ 2 Minimum(P,, {T < t}, P,,{T = t} 
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where T is defined by Equation (9.10) with N,,...,N, being independent 
Poisson random variables each with mean m. We can now approximate P,,,{T < t} 
and P,,{T > t} via a simulation. That is, we continually generate r independent 
Poisson random variables with mean m and compute the resulting value of T. 
The proportion of these for which T < t is our estimate of P{T < t}, and the 
proportion for which T > t is our estimate of P{T > t}. 

If the above p-value is quite small, we reject the null hypothesis that the daily 
arrivals constitute a nonhomogeneous Poisson process. However, if the p-value 
is not small, this only implies that the assumption that the number of arrivals 
each day has a Poisson distribution is a viable assumption and does not by itself 
validate the stronger assumption that the actual arrival pattern (as determined 
by the nonhomogeneous intensity function) is the same from day to day. To 
complete our validation we must now consider the actual arrival times for each 
of the r days observed. Suppose that the arrival times on day j, j= 1,...,7r, 
are Xj Kjo K, Ny Now if the arrival process is indeed a nonhomogeneous 
Poisson process, it can be shown that each of these r sets of arrival times consti- 
tutes a sample from a common distribution. That is, under the null hypothesis, the 
r sets of data X Taass X GNp j=1,...,r, are all independent random variables 
from a common distribution. 

The above consequence, however, can be tested by the multisample rank test 
given in Section 9.3. That is, first rank all the N = }-;_; N; data values, and then 
let R, denote the sum of the ranks of all the N, data values from the jth set. The 
test quantity 


(r-n) 


12 5 2 


R= Ne N, 


j=l j 


can now be employed by using the fact that, when H) is true, R has approximately 
a chi-square distribution with r — 1 degrees of freedom. Hence, if the observed 
value of R is R = y, the resulting p-value can be approximated by 


p-value = 2 Minimum (Pp {R < y}, Py, {R> y} 
~ 2 Minimum (P {X?_, <y}, 1- P {X2 <y}) 


where X?_, is a chi-square random variable with r — 1 degrees of freedom. (Of 
course, we could also approximate the p-value by a simulation.) If the above 
p-value, along with the previous p-value considered, is not too small, we may 
conclude that the data are not inconsistent with our assumption that daily arrivals 
constitute a nonhomogeneous Poisson process. 


A Technical Remark Many readers may wonder why we used a two-sided 
region to calculate the p-value in (9.11), rather than the one-sided region used in 
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the multisample rank sum test. It is because a multisample rank sum test assumes 
that the data come from m distributions, and, because R is small when these 
distributions are equal, a p-value based on a one-sided probability is appropriate. 
However, in testing for a periodic nonhomogeneous Poisson process, we want 
to test both that the arrival times on day i come from some distribution and that 
this distribution is the same for all i. That is, we do not start by assuming, as is 
done in the rank sum test, that we have data from a fixed number of separate 
distributions. Consequently, a two-sided test is appropriate, because a very small 
value of R might be indicative of some pattern of arrivals during a day, i.e., even 
though the number of arrivals each day might have the same Poisson distribution, 
the daily arrival times might not be independent and identically distributed. oO 


Example 9h Suppose that the daily times at which deliveries are made at 
a certain plant are noted over 5 days. During this time the numbers of deliveries 
during each of the days are as follows: 


18, 24, 16, 19, 25 


Suppose also that when the 102 delivery times are ranked according to the time 
of day they arrived, the sums of the ranks of the deliveries from each day are 


1010, 960, 1180, 985, 1118 


Using the above data, let us test the hypothesis that the daily arrival process of 
deliveries is.a nonhomogeneous Poisson process. 

We first test that the first data set of the daily number of deliveries consists of 
a set of five independent and identically distributed Poisson random variables. 
Now the sample mean and sample variance are equal to 


N =20.4 and S? = 15.3 


and so the value of the test quantity is T = 0.75. To determine the approximate 
p-value of the test that the N; are independent Poisson random variables, we 
then simulated 500 sets of five Poisson random variables with mean 20.4 and 
then computed the resulting value of T = S?/N. The output of this simulation 
indicated a p-value of approximately 0.84, and so it is clear that the assumption 
that the numbers of daily deliveries are independent Poisson random variables 
having a common mean is consistent with the data. 

To continue our test of the null hypothesis of a nonhomogeneous Poisson 
process, we compute the value of the test quantity R, which is seen to be equal 
to 14.425. Because the probability that a chi-square random variable with four 
degrees of freedom is as large as 14.425 is 0.006, it follows that the p-value is 
0.012, For such a small p-value we must reject the null hypothesis. im 
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If we wanted to test the assumption that a daily arrival process constituted 
a homogeneous Poisson process, we would proceed as before and first test the 
hypothesis that the numbers of arrivals each day are independent and identically 
distributed Poisson random variables. If the hypothesis remains plausible after 
we perform this test, we again continue as in the nonhomogeneous case by 
considering the actual set of N = ja Nj arrival times. However, we now 
use the result that under a homogeneous Poisson process, given the number of 
arrivals in a day, the arrival times are independently and uniformly distributed 
over (0,7), where T is the length of a day. This consequence, however, can be 
tested by the Kolmogorov-Smirnov goodness of fit test presented in Section 9.1. 
That is, if the arrivals constitute a homogeneous Poisson process, the N random 
variables Xoi = 1,...,N, j= 1,...,7, where X;; represents the ith arrival 
time on day j, can be regarded as constituting a set of N independent and 
uniformly distributed random variables over (0,7). Hence, if we define the 
empirical distribution function F, by letting F,(x) be the proportion of the N 
data values that are less than or equal to x—that is, 
ne 
F@= Lys 


j=l i=l 


where 


I 


hi 


_ jl ifx,,<x 
~ 10 otherwise 


then the value of the test quantity is 


D = Maximum 
O<x<T 


F)- | 


Once the value of the test statistic D is determined, we can then find the resulting 
p-value by simulation, as is shown in Section 9.1. 

If the hypothesis of a nonhomogeneous Poisson process is shown to be con- 
sistent with the data, we face the problem of estimating the intensity function 
A(t),0<t<T, of this process. [In the homogeneous case the obvious estimator 
is A() = d/ T, where A is the estimate of the mean number of arrivals in a day 
of length T.] To estimate the intensity function, order the N = }'_, N; daily 
arrival times. Let yọ =0, and for k = 1,...,N, let y, denote the kth smallest 
of these N arrival times. Because there has been a total of 1 arrival over r days 
within the time interval (y,_,, Yz), k =1,...,N, a reasonable estimate of A(#) 
would be 


AQ) = for Yki < É< Yg 


(Ve — Ye-1) 
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[To understand the above estimator, note that if X(t) were the intensity function, 
the expected number of daily arrivals that occur at a time point ¢ such that 
Yp-1 < £ < y, would be given by 


EINO) -Nol = f" Adr = 


and hence the expected number of arrivals within that interval over r days 
would be 1, which coincides with the actual observed number of arrivals in that 
interval. ] 


Exercises 


1. According to the Mendelian theory of genetics, a certain garden pea plant 
should produce white, pink, or red flowers, with respective probabilities i, L, i. 
To test this theory a sample of 564 peas was studied with the result that 141 
produced white, 291 produced pink, and 132 produced red flowers. Approximate 


the p-value of this data set 


(a) by using the chi-square approximation, and 
(b) by using a simulation. 


2. To ascertain whether a certain die was fair, 1000 rolls of the die were 
recorded, with the result that the numbers of times the die landed i,i = 
1, 2,3, 4,5, 6 were, respectively, 158, 172, 164, 181, 160, 165. Approximate the 
p-value of the test that the die was fair 


(a) by using the chi-square approximation, and 
(b) by using a simulation. 


3. Approximate the p-value of the hypothesis that the following 10 values are 
random numbers: 0.12, 0.18, 0.06, 0.33, 0.72, 0.83, 0.36, 0.27, 0.77, 0.74. 


4. Approximate the p-value of the hypothesis that the following data set of 14 
points is a sample from a uniform distribution over (50, 200): 


164, 142, 110, 153, 103, 52, 174, 88, 178, 184, 58, 62, 132, 128 


5. Approximate the p-value of the hypothesis that the following 13 data values 
come from an exponential distribution with mean 50: 


86, 133, 75, 22, 11, 144, 78, 122, 8, 146, 33, 41, 99 
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6. Approximate the p-value of the test that the following data come from a 
binomial distribution with parameters (8, p), where p is unknown: 


6,7, 3,4, 7,3, 7, 2, 6, 3,7, 8,2 1,3,5,8,7 


7. Approximate the p-value of the test that the following data set comes from 
an exponentially distributed population: 122, 133, 106, 128, 135, 126. 


8. To generate the ordered values of n random numbers we could generate 
n random numbers and then order, or sort, them. Another approach makes use 
of the result that given that the (n + 1)st event of a Poisson process occurs at 
time ¢, the first n event times are distributed as the set of ordered values of n 
uniform (0, t) random variables. Using this result, explain why, in the following 


algorithm, y,,...,y, denote the ordered values of n random numbers. 
Generate n+-1 random numbers U,,..., U,4; 
X; = —log U; i=1,...,n41 
n+1 


1 
ry Xe c= 
i=l í 

Yi = Yi-ı + CX; i=1,...,n (with yo =0) 


9. Generate the values of 10 independent exponential random variables each 
having mean 1. Then, based on the Kolmogorov—-Smirnov test quantity, approx- 
imate the p-value of the test that the data do indeed come from an exponential 
distribution with mean 1. 


10. An experiment designed to compare two treatments against corrosion 
yielded the following data (representing the maximum depth of pits in units of 
one-thousandth of an inch) in pieces of wire subjected to one or the other of the 
two treatments: 


Treatment 1: 65.2 67.1 69.4 78.4 74.0 80.3 
Treatment 2: 59.4 72.1 68.0 66.2 58.5 


Compute the exact p-value of this data set when testing the hypothesis that the 
two treatments have identical results. 


11. In Exercise 10, compute the approximate p-value based on 


(a) the normal approximation, and 
(b) a simulation. 


12. Fourteen cities, of roughly equal size, are chosen for a traffic safety study. 
Seven of them are randomly chosen, and in these cities a series of newspaper 
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articles dealing with traffic safety are run over a 1-month period. The numbers 
of traffic accidents reported in the month following this campaign are as follows: 


Treatment group: 19 31 39 45 47 66 75 
Control group: 28 36 44 4 52 72 72 


„Determine the exact p-value when testing the hypothesis that the articles have 
not had any effect. 


13. Approximate the p-value in Exercise 12 


(a) by using the normal approximation, and 
(b) by using a simulation. 


14. Explain how simulation can be employed to approximate the p-value in the 
multisample problem—that is, when testing that a set of m samples all come 
from the same probability distribution. 


15. Consider the following data resulting from three samples: 


Sample 1: 121 144 158 169 194 211 242 
Sample 2: 99 128 165 193 242 265 302 
Sample 3: 129 134 137 143 152 159 170 


Compute the approximate p-value of the test that all the data come from a single 
probability distribution 


(a) by using the chi-square approximation, and 
(b) by using a simulation. 


‘16. The number of daily arrivals over an 8-day interval are as follows: 


122, 118, 120, 116, 125, 119, 124, 130 


Do you think the daily arrivals could be independent and identically distributed 
as nonhomogeneous Poisson processes? 
17. Over an interval of length 100 there have been 18 arrivals at the following 
times: 

12, 20, 33, 44, 55, 56, 61, 63, 66, 70, 73, 75, 78, 80, 82, 85, 87, 90 


Approximate the p-value of the test that the arrival process is a (homogeneous) 
Poisson process. 


| 
t 
l 
l 
i 
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Markev Chain Monte 
Carlo Methods 


Introduction 


It is, in general, very difficult to simulate the value of a random vector X 
whose component random variables are dependent. In this chapter we present a 
powerful approach for generating a vector whose distribution is approximately 
that of X. This approach, called the Markov chain Monte Carlo method, has the 
added significance of only requiring that the mass (or density) function of X 
be specified up to a multiplicative constant, and this, we will see, is of great 
importance in applications. 

In Section 10.1 we introduce and give the needed results about Markov chains. 
In Section 10.2 we present the Hastings—Metropolis algorithm for constructing 
a Markov chain having a specified probability mass function as its limiting 


„distribution. A special case of this algorithm, referred to as the Gibbs sampler, 


is studied in Section 10.3. The Gibbs sampler is probably the most widely used 
Markov chain Monte Carlo method. An application of the preceding methods to 
deterministic optimization problems, known as simulated annealing, is presented 
in Section 10.4. In Section 10.5 we present the sampling importance resampling 
(SIR) technique. While not strictly a Markov chain Monte Carlo algorithm, it 
also results in approximately simulating a random vector whose mass function 
is specified up to a multiplicative constant. 


10.1 Markov Chains 


Consider a collection of random variables Xp, X,,.... Interpret X,, as the “state 
of the system at time n,” and suppose that the set of possible values of the 
X,,—that is, the possible states of the system—is the set 1,..., N. If there exists 


245 
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a set of numbers P;; yi j=1,...,N, such that whenever the process is in state i 
then, independent of the past states, the probability that the next state is j is P;,, 
then we say that the collection {X,, > 0} constitutes a Markov chain having 
transition probabilities P; ipo ts j=1,...,N. Since the process must be in some 
state after it leaves states i, these tansion probabilities satisfy 


A Markov chain is said to be irreducible if for each pair of states i and j there 
is a positive probability, starting in state i, that the process will ever enter state 
J. For an irreducible Markov chain, let 7, denote the long-run proportion of time 
that the process is in state j. (It can be shown that 7; exists and is constant, with 
probability 1, independent of the initial state.) The quantities 7;, j =1,...,N, 
can be shown to be the unique solution of the following set of linear equations: 


m=) mPp j=1,...,N 
(10.1) 


Remark The set of equations (10.1) have a heuristic interpretation. Since 
qT; is the proportion of time that the Markov chain is in state i and since each 
transition out of state i is into state j with probability P; ij> it follows that 7;P,; is 
the proportion of time in which the Markov chain has just entered state j from 
state i. Hence, the top part of Equation (10.1) states the intuitively clear fact 
that the proportion of time in which the Markov chain has just entered state j is 
equal to the sum, over all states i, of the proportion of time in which it has just 
entered state j from state i. The bottom part of Equation (10.1) says, of course, 
that summing the proportion of time in which the chain is in state j, over all j, 
must equal 1. a 


The {7} are often called the stationary probabilities of the Markov chain. For 
if the initial state of the Markov chain is distributed according to the {7} then 
P{X, = j} = 7,, for all n and j (see Exercise 1). 

An important property of Markov chains is that for any function A on the state 
space, with probability 1, 


m Z 5 hX) =} r h(j) (10.2) 
i=l j=l 
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The preceding follows since if p,(m) is the proportion of time that the chain is 
in state j between times 1,..., 7 then 


7 SOM) = = 2 hC)P;(n) > Tac, 


j=l 


‘The quantity 77; can often be interpreted as the limiting probability that the chain 


is in state j. To make precise the conditions under which it has this interpretation, 
we first need the definition of an aperiodic Markov chain. 


Definition An irreducible Markov chain is said to be aperiodic if for some 
n > 0 and some state j, 


P{X,=j|X=j}>O and PAX, = jX =j} > 0 
It can be shown that if the Markov chain is irreducible and aperiodic then 
m= lim P{X,=j}, j=l,...,N 


There is sometimes an easier way than solving the set of equations (10.1) 
of finding the stationary probabilities. Suppose one can find positive numbers 
xp j=l,..., N such that 


N 
x,P,,=x;P,_, fori#j, J} x=1 


iP 
j=l 


Then summing the preceding equations over all states i yields 


N 
2P =1 5% =x; 


which, since {7;, j = 1, . . . , N} is the unique solution of (10.1), implies that 


m=; 


When 77;P;; = 7;P,;, for all i Æ j, the Markov chain is said to be time reversible, 
because it can be shown, under this condition, that if the initial state is chosen 
according to the probabilities {7}, then starting at any time the sequence of 
states going backwards in time will also be a Markov chain with transition 
probabilities P 

Suppose now that we want to generate the value of a random variable X having 
probability mass function P{X = j} = p; 7=1,..., N. If we could generate an 
irreducible aperiodic Markov chain with limiting probabilities p;, 7 =1,...,N, 
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then we would be able to approximately generate such a random variable by run- 
ning the chain for n steps to obtain the value of X,,, where n is large. In addition, 
if our IE was to generate many random variables distributed according to 
P; j=1,...,N, so as to be able to estimate E[h(X)] = Pa h(j)P; then ` we 
could also estimate this quantity by using the estimator - one i1 2(X;). However, 
since the early states of the Markov chain can be strongly influenced by the initial 
state chosen, it is common in practice to disregard the first k states, for some 
suitably chosen value of k. That is, the estimator + Dpr (X;), is utilized. 
It is difficult to know exactly how large a value of k should be used [although 
the advanced reader should see Aarts and Korst (1989) for some useful results 
along this line] and usually one just uses one’s intuition (which usually works 
fine because the convergence is guaranteed no matter what value is used). 

An important question is how to use the simulated Markov apes to estimate 
the mean square error of the estimator. That is, if we let @ = wok Dicer W(X), 
how do we estimate 


2 


MSE =E (2-20) 


One way is the batch means method, which works as follows. Break up the n— k 
generated states into s batches of size r, where s = (n — k)/r is integral, and let 
Vig = heras be the average of the jth batch. That is, 


1 k+jr 
Y,=- XO WAXD) j=1,...35 
T iskt-1)r+l 
Now, treat the Y, j =1,...,s as if they were independent and identically 


distributed with variance o? and use their sample variance 6? = J$ (Y-Y Y)*/ 
(s— 1) as the estimator of a”. The estimate of MSE is G?/s. The appropriate 
value of r depends on the Markov chain being simulated. The closer X;, i > 1, 
is to being independent and identically distributed, then the smaller should be 
the value of r. 

In the next two sections we will show, for a given set of positive numbers 
b;, j= 1,...,N, how to construct a Markov chain whose limiting probabilities 
are 7; =b/ 5 ibp j=1,..., N. 


10.2 The Hastings-Metropolis Algorithm 
Let b(j), 7=1,...,m be positive numbers, and B = >", b(j). Suppose that m 


is large and B is difficult to calculate, and that we want to simulate a random 
variable (or a sequence of random variables) with probability mass function 


mj) =b(/)/B, j=1,...,m 
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One way of simulating a sequence of random variables whose distributions 
converge 7(j), 7 =1,...,m, is to find a Markov chain that is easy to simulate 
and whose limiting probabilities are the 7,;. The Hastings—Metropolis algorithm 
provides an approach for accomplishing this task. It constructs a time-reversible 
Markov chain with the desired limiting probabilities, in the following manner. 

Let Q be an irreducible Markov transition probability matrix on the integers 
1,...,m, with q(i, j), representing the row i, column j element of Q. Now 
define a Markov chain {X,, n > 0} as follows. When X,, = i, a random variable 
X such that P{X = j} = q(i, j), j=1,...,m, is generated. If X = j, then X,,, 
is set equal to j with probability a(i, j) and is set equal to i with probability 
1—a(i, j). Under these conditions, it is easy to see that the pice of states 
will constitute a Markov chain with transition probabilities P, ; given by 


Pi; = q(i, jje(i, Ds if j # i 


P= 4G, i) +) ali, k)(1 — ai, k)) 


kži 


Now this Markov chain will be time reversible and have stationary probabilities 


m(j) if 
a(i)P,; = Tm(j)P; for j#i 
which is equivalent to 


m(i)g(i, Jali, j) = 7Z)aG, DaG, i) 


It is now easy to check that this will be satisfied if we take 


_ _ (maa, i) . (b(a i) 
ali) = min (TS P 1)= nin (Nae) D 1) GR 


[To check, note that if æ(i, j) = (j)qU, D/7(i)q(i, j) then a(j, i) = 1, and vice 
versa. 

a reader should note that the value of B is not needed to define the Markov 
chain, as the values b(j) suffice. Also, it is almost always the case that w(j), j= 
1,...,m, will not only be stationary probabilities but will also be limiting 
probabilities. (Indeed, a sufficient condition is that P;; > 0 for some i.) 

The following sums up the Hastings~Metropolis algorithm for generating a 
time-reversible Markov chain whose limiting probabilities are 7(j) = b(j)/B, j= 
Feces me 


1. Choose an irreducible Markov transition probability matrix Q with transition 
probabilities q(i, j),i, j = 1,...,m. Also, choose some integer value k 
between 1 and m. 
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2. Letn=0 and X% =k. 

3. Generate a random variable X such that P{X = j} = q(X,,, j) and generate 
a random number U. . 

4. If U < [b(X)q(X, X,)]/[b(X,,) q(X,,, X)], then NS = X; else NS = X,. 

5. n=n+1, X, = NS. 

6. Go to 3. 


‘Example 10a Suppose that we want to generate a random element from 
a large complicated “combinatorial” set £. For instance, £ might be the set of 
all permutations (x,,...,x,) of the numbers (1, . . . , n) for which Dei ApS a 
for a given constant a; or £ might be the set of all subgraphs of a given graph 
having the property that for any pair of vertices i and j there is a unique path in 
the subgraph from i to j (such subgraphs are called trees). 

To accomplish our goal we will utilize the Hastings—Metropolis algorithm. We 
shall start by assuming that one can define a concept of “neighboring” elements 
of £, and we will then construct a graph whose set of vertices is 2 by putting an 
arc between each pair of neighboring elements in £. For example, if £ is the set 
of permutations (x,,...,x,) for which J>}; jx; > a, then we can define two 
such permutations to be neighbors if one results from an interchange of two of 
the positions of the other. That is (1, 2, 3, 4) and (1, 2, 4, 3) are neighbors, 
whereas (1, 2, 3, 4) and (1, 3, 4, 2) are not. If £ is a set of trees, then we can 
say that two trees are neighbors if all but one of the arcs of one of the trees are 
also arcs of the other tree. 

Assuming this concept of neighboring elements, we define the q transition 
probability function as follows. With N(s) defined as the set of neighbors of s, 
and |N(s)| equal to the number of elements in the set N(s), let 


qs, t) = if te Ns) 


1 
INCS)? 
That is, the target next state from s is equally likely to be any of its neighbors. 


Since the desired limiting probabilities of the Markov chain are 7(s) = C, it 
follows that a(s) = a(t), and so 


a(s, £) = min(|N(s)|/|N(@)|, 1) 


That is, if the present state of the Markov chain is s, then one of its neighbors 
is randomly chosen—say it is t. If t is a state with fewer neighbors than s (in 
graph theory language, if the degree of vertex ż is less than that of vertex s), 
then the next state is ¢. If not, a random number U is generated, and the next 
state is t if U < |N(s)|/|N(d|, and is s otherwise. The limiting probabilities of 
this Markov chain are a(s) = 1/|£|. o 
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10.3 The Gibbs Sampler 


The most widely used version of the Hastings-Metropolis algorithm is the Gibbs 
sampler. Let X = (X,,..., X„) be a random vector with probability mass func- 
tion (or probability density function in the continuous case) p(x) that need only 
be specified up to a multiplicative constant, and suppose that we want to generate 


- a random vector whose distribution is that of X. That is, we want to generate a 


random vector having mass function 


p(x) = Cg(x) 


where g(x) is known, but C is not. Utilization of the Gibbs sampler assumes 
that for any i and values x,, j #i, we can generate a random variable X having 
the probability mass function 


P{X =x} = P{X; = x|X,;=x;, jZ i} (10.4) 


It operates by using the Hastings—Metropolis algorithm on a Markov chain with 
states x = (x,,...,X,), and with transition probabilities defined as follows. 
Whenever the present state is x, a coordinate that is equally likely to be any of 
1,...,2 is chosen. If coordinate i is chosen, then a random variable X whose 
probability mass function is as given by Equation (10.4) is generated, and if 
X = x then the state y = (4),...,%;1,%,%41,---»X,) is considered as the 
candidate next state. In other words, with x and y as given, the Gibbs sampler 
uses the Hastings—Metropolis algorithm with 


p(y) 
nP{X;=x;,j £i} 


1 ee 
q(x, y) = Pld = 1X; = J Fi}= 


Because we want the limiting mass function to be p, we see from Equation (10.3) 
that the vector y is then accepted as the new state with probability 


__. (PaE) 
aS = ae Garc y 1) 


x 
= (2 3 i) 
P(x)p(y) 
= 1 
Hence, when utilizing the Gibbs sampler, the candidate state is always accepted 
as the next state of the chain. 


Example 10b Suppose we want to generate n random points in the circle 
of radius 1 centered at the origin, conditional on the event that no two points are 
within a distance d of each other, where 


B = P{no two points are within d of each other} 
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is assumed to be a small positive number. (If 8 were not small, then we could 
just continue to generate sets of n random points in the circle, stopping the 
first time that no two points in the set are within d of each other.) This can 
be accomplished by the Gibbs sampler by starting with n points in the circle, 
Xis <- + > Xp, Such that no two are within a distance d of each other. Then generate 
a random number U and let J = Int(nU) +1. Also generate a random point in 
the circle. If this point is not within d of any of the other n — 1 points excluding 
Xp, then replace x, by this generated point; otherwise, generate a new point and 
repeat the operation. After a large number of iterations the set of n points will 
approximately have the desired distribution. o 


Example 10c Queueing Networks Suppose that r individuals move 
among m+ 1 queueing stations, and let, for i = 1,...,m, X;(t) denote the 
number of individuals at station i at time ż. If 


Phi» -+ s Mm) = lim P{X;(2) =n, i= 1,...,m} 


then, assuming exponentially distributed service times, it can often be estab- 
lished that 


m m 
Pln.) = CT] P(n), if Don, <r 
i=l i=l 


where P;(n), n > 0 is a probability mass function for each i = 1, . . . , m. Such a 
joint probability mass function is said to have a product form. 
Although it is often relatively straightforward both to establish that 
p(m,...,",) has the preceding product form and to find the mass func- 
tions P; it can be difficult to explicitly compute the constant C. For even 
though 


cy TIP) =1 


nis(n)<r i=1 


where n = (n,,...,,,) and s(n) = Ð; n; it can be difficult to utilize this 
result. This is because the summation is over all nonnegative integer vectors n 
for which Ð; 7; < r and there are ("'”) such vectors, which is a rather large 
number even when m and r are of moderate size. 

Another approach to learning about p(n,,...,7,,), which finesses the com- 
putational difficulties of computing C, is to use the Gibbs sampler to generate a 
sequence of values having a distribution approximately that of p. 
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To Ea note that if N = (N,,...,N,,) has the joint mass function p, then, 
for n= r — J ggi Nps 
P{N; = n|N, =s. N = ni Nai = nigo ee eo Nm = Nn} 
PC, es Mina My Migs + + Mm) 
Epps -o -o Minas Js Migiro Mm) 
2 P;(n) 
PQ) 


where the preceding sum is over all j =0,...,r— esi n,. In other words, the 
conditional distribution of N, given the values of N;, ji, is the same as the 
conditional distribution of a random variable having mass function P; given that 
its value is less than or equal to r — } jz; Nj. 

Thus, we may generate the values of a Markov chain whose limiting proba- 
bility mass function is p(m,,...,1,,) as follows: 


1. Let (n,,...,7,,) be arbitrary nonnegative integers satisfying };; n; < r. 

2. Generate U and let J = Int(mU +1). 

3. IfI =i, let X; have mass function P; and generate a random variable N whose 
distribution is the conditional distribution of X; given that X; < r— Ð jan; 

4. Let n; = N and go to 2. 


The successive values of (7,,...,7,,) Constitute the sequence of states of a 
Markov chain with the limiting distribution p. All quantities of interest con- 
cerning p can be estimated from this sequence. For instance, the average of the 
values of the jth coordinate of these vectors will converge to the mean number 
of individuals at station j, the proportion of vectors whose jth coordinate is less 


_ than k will converge to the limiting probability that the number of individuals at 


station j is less than k, and so on. (m! 
Example 10d _ Let X,,i=1,...,7, be independent random variables with 
X; having an exponential distribution with rate A;,i=1,...,. Let S = Li Xi 
and suppose we want to generate the random vector X = (X,...,X,) condi- 


tional on the event that S > c for some large positive constant c. That is, we want 
to generate the value of a random vector whose density function is given by 


VETERE DE = gpg aM, if 2x >e 


This is easily accomplished by starting with an initial vector x = (x,,...,X,) 
satisfying x; > 0,i=1,...,n, and J`; x; > c. Then generate a random number 
U and set J = Int(nU +1). Suppose that J = i. Now, we want to generate 


| 
| 
| 
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an exponential random variable X with rate A; conditioned on the event that 
X+}; x; >c. That is, we want to generate the value of X conditional on 
the event that it exceeds c — J? ja; x;. Hence, using the fact that an exponential 
conditioned to be greater than a positive constant is distributed as the constant 
plus the exponential, we see that we should generate an exponential random 
variable Y with rate A; (say, let Y = —1/A,log U), and set 


X=Y+ (e-z) 


j*i 


- where b* is equal to b when b > 0 and is 0 otherwise. The value of x; should 
then be reset to equal X and a new iteration of the algorithm begun. o 


Suppose now that we interested in estimating 
a = P{h(X) > a} 


where X = (X,,...,X,,) is a random vector, h is an arbitrary function of X, 
and æ is very small. Because a generated value of h(X) will almost always be 
less than a, it would take a huge amount of time to obtain an estimator whose 
error is small relative to æ if we use a straightforward Gibbs sampler approach 
to generate a sequence of random vectors whose distribution converges to that 
of X. Consider, however, the following approach. 

To begin, note that for values —co = ay < a, < a, <---<a,=4, 


g 
a =[[P{h(X) > a,|h(X) > ai} 


i=] 


Thus, we can obtain an estimator of æ by taking the product of estimators of the 
quantities P{h(X) > a;|h(X) > a;_,}, for i=1,...,k. For this to be efficient, 
the values a;,i=1,...,k, should be chosen so that P{h(X) > a,|h(X) > a,_,} 
are all of moderate size. 

To estimate P{h(X) > a;|h(X) > a;_,}, we make use of the Gibbs sampler as 
follows. 


l. Set J=N=0. 

2. Choose a vector x such that h(x) > a;_). 

3. Generate a random number U and set J = Int(nU) +1. 

4. If I =k, generate X having the conditional distribution of X, given that 
X= xp JAk. 

5. If A(x,- -o 3 Xka1s X, Xpyys +++ Xn) S Aip retur to 4. 

6. N=N+1,x, =X. 

7. É h(xy,...,%4) > a; then J=J+1. 

8. Go to 3. 
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The ratio of the final value of J to that of N is the estimator of P{h(X) > 
a;|h(X) > a;_}. 


Example 10e Suppose in the queueing network model of Example 10d 
that the service times at server i are exponential with rate w;,i=1,...,m+1, 
and that when a customer completes service at server i then, independent of all 


-else, that customer then moves over to join the queue (or enter service if the 


where 1°")! P; = 1. It can then 


server is free) at server j with probability P;,, 
be shown that the limiting probability mass function of the number of customers 


at servers 1,...,m is given, for ĵo; nj < r, by 
z Ti m E 
Plis- -sAm =C] (z=) 
jot \ Timi Pj 
where Tap j= 1,...,m-+1, are the stationary probabilities of the Markov chain 


with transition probabilities P;,. That is, they are the unique solution of 


m+i 


m=} a Pi; 


i=} 


m+i 


a=! 


j=l 


If we renumber the servers so that max(7;/H;) = Tnsi/Mm41> then letting a; = 
m 
Tihmt1/ Tmk; We have that for On; <T, 


P(m,..-.N,) =C [[(0)" 
j=l 


where 0 < a; < 1. It easily follows from this that the conditional distribution of 


the number of customers at server i, given the numbers n j jÆ i, at the other 
m — 1 servers, is distributed as the conditional distribution of —1 plus a geometric 
random variable with parameter 1—a;, given that the geometric is less than or 
equal to r+1— J jn 

In the case where the 7; and yz; are both constant for all j, the conditional 
distribution of the number of customers at server i, given the numbers n,, j + i, 
at the other servers excluding server m + 1, is the discrete uniform distribution 
on 0,1,...,r— } jz nj. Suppose this is the case and that m = 20, r = 100, and 
that we are interested in estimating the limiting probability that the number of 
customers at server 1—call it X,—is greater than 18. Letting tọ = —1, f =5, 
h =9, t, = 12, t, = 15, t; = 17, tę = 18, we can use the Gibbs sampler to succes- 
sively estimate the quantities P{X, > t,|X, > t,_,},i=1,2,3,4,5, 6. We would 
estimate, say P{X, > 17|X, > 15}, by starting with a vector n,,..., zo for 
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which n, > 15 and s = 2 n; < 100. We then generate a random number U 


and let J = Int(20U +1). A second random number V is now generated. If I = 1; 
then n, is reset to - 


n, = Int((85 —s+7,)V) +16 
If J #1, then n, is reset to 
n, = Int((101 —s-+n,)V) 


. The next iteration of the algorithm then begins; the fraction of iterations for 
which n, > 17 is the estimate of P{X, > 17|X, > 15}. a 


The idea of writing a small probability as the product of more moderately sized 
conditional probabilities and then estimating each of the conditional probabilities 
in turn does not require that the Gibbs sampler be employed. Another variant 
of the Hastings—Metropolis algorithm might be more appropriate. We illustrate 
by an example that was previously treated, in Example 8v, by using importance 
sampling. 


Example 10f Suppose that we are interested in estimating the number of 


permutations x = (x,,...,x,) for which t(x) > a, and where t(x) = )°j_, jx; and 
where a is such that this number of permutations is very small in comparison to 
n!. If we let X = (X,,..., X„) be equally likely to be any of the n! permutations 
and set 


a = P{T(X) > a} 


then a is small and the quantity of interest is an!. Letting 0 =a) <a, <---< 
a, = a, we have that 


a= P{T(X) > a,|T(X) > a;i} 


i=] 


To estimate P{T(X) > a,|T(X) > a;_,} we use the Hastings—Metropolis algo- 
rithm as in Examples 10a or 10b to generate a Markov chain whose limiting 
distribution is 


1 
T(x) = N’ if T(x) > aii 


where N;_, is the number of permutations x such that T(x) > a;_,. The proportion 
of the generated states x of this Markov chain that have T(x) > a; is the estimate 
of P{T(X) > a,[T(X) > a;_,}. a 
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In many applications it is relatively easy to recognize the form of the condi- 
tional distributions needed in the Gibbs sampler. 


Example 10g Suppose that for some nonnegative function h(y, z) the joint 
density of the nonnegative random variables X, Y, and Z is 


f(x,y, 2) = Cx "(1—x)"h(y,z), for0<x<05 


Then the conditional density of X given that Y = y and Z =z is 


_ fyz) 
JUIS fy.z, 2) 


Since y and z are fixed and x is the argument of this conditional density, we can 
write the preceding as 


f(xly, z) = Cy f(x, y, z) 
where C, does not depend on x. Hence, we have that 
f(y, z) — Cox (1 —x)”, 0<x<0.5 


where C, does not depend on x. But we can recognize this as the conditional 
density of a beta random variable with parameters y and zy+ 1 that is conditioned 
to be in the interval (0, 0.5). oO 


Rather than always choosing a random coordinate to update on, the Gibbs sampler 
can also consider the coordinates in sequence. That is, on the first iteration we 
could set J = 1, then set J = 2 on the next iteration, then J = 3, and so on until 
the nth iteration, where J = n. On the next iteration, we start over. We illustrate 


-this with our next example, which is concerned with modeling the numbers of 


home runs hit by two of the best hitters in baseball. 


Example 10h Let N,(t) denote the number of home runs hit in the first 
100+ percent of a baseball season, 0 < t < 1, by the baseball player Barry Bonds; 
similarly, let Nj(t) be the number hit by Ken Griffey. 

Suppose that there are random variables W, and W, such that given that 
W, = w and W, = w, {N,(t),0 < t < 1} and {N (t), 0 < t < 1} are independent 
Poisson processes with respective rates w, and w,. Furthermore, suppose that 
W, and W, are independent exponential random variables with rate Y, which is 
itself a random variable that is uniformly distributed between 0.02 and 0.10. In 
other words, the assumption is that the players hit home runs in accordance with 
Poisson processes whose rates are random variables from a distribution that is 
defined in terms of a parameter that is itself a random variable with a specified 
distribution. 
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Suppose that Bonds has hit 25 and Griffey 18 home runs in the first half of 
the season. Give a method for estimating the mean number they each hit in the 
full season. ; 


Solution Summing up the model, there are random variables Y, W,, W, such 
that: 


1. Y is uniform on (0.02, 0.10). 
2. Given that Y = y, W, and W, are independent and identically distributed 
exponential random variables with rate y. 
3. Given that W, = w, and W, = w,,{N,(t)} and {N,(2)} are independent 
Poisson processes with rates w, and w,. 
To find E[N,(1)|N, (0.5) = 25, N,(0.5) = 18], start by conditioning on Wj. 
E[N, (1) |W, (0.5) = 25, N,(0.5) = 18, W,] = 25+0.5W, 


Taking the conditional expectation, given that N, (0.5) =25 and N,(0.5) = 18, 
of the preceding yields that 


_E[N, (1) |W, (0.5) = 25, N,(0.5) = 18] 
= 25 +0.5E[W,|N, (0.5) = 25, N,(0.5) = 18] 


Similarly, 


E[N,(1)|N, (0.5) = 25, N,(0.5) = 18] 
= 18 +0.5E[W,|N, (0.5) = 25, N,(0.5) = 18] 


We can now estimate these conditional expectations by using the Gibbs sampler. 
To begin, note the joint distribution: For 0.02 < y < 0.10, w, > 0, w, > 0, 


JO, Wy, w, N, (0.5) = 25, N,(0.5) = 18) 
= Cpe oHa e-tu) (yy, 25 (u)! 
where C does not depend on any of y, w,, w. Hence, for 0.02 < y < 0.10, 
f(y|w,, w, Ny = 25, M, = 18) = Ciy e t 
which shows that the conditional distribution of Y given w,, w,, N, =25, M, = 
18, is that of a gamma random variable with parameters 3 and w; + w, that is 


conditioned to be between 0.02 and 0.10. Also, 


f(w;|y, w, N, (0.5) = 25, N,(0.5) = 18) = C e7 0+2 (w,)* 
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from which we can conclude that the conditional distribution of W, given 
y, w, N, = 25, N, = 18 is gamma with parameters 26 and y+ +. Similarly, the 
conditional distribution of W, given y, w,,N, = 25, N, = 18, is gamma with 
parameters 19 and y+. 

_ Hence, starting with values y, w,, w,, where .02 < y < 0.10, and w; > 0, the 
Gibbs sampler is as follows. 


1. Generate the value of a gamma random variable with parameters 3 and 
w, +w, that is conditioned to be between or 0.02 and 0.10, and let it be the 
new value of y. 

2. Generate the value of a gamma random variable with parameters 26 and 
y+, and let it be the new value of w,. 

3. Generate the value of a gamma random variable with parameters 19 and 
y+, and let it be the new value of wy. 

4. Return to Step 1. 


The average of the values of w, is our estimate of E[W,|N,(0.5) = 25, N,(0.5) = 
18], and the average of the values of w, is our estimate of E[W,|N,(0.5) = 
25, N,(0.5) = 18]. One-half of the former plus 25 is our estimate of the mean 
number of home runs that Barry Bonds will hit over the year, and one-half 
of the latter plus 18 is our estimate of the mean number that Ken Griffey 
will hit. 

It should be noted that the numbers of home runs hit by the two players are 
dependent, with their dependence caused by their common dependence on the 
value of the random variable Y. That is, the value of Y (which might relate 
to such quantities as the average degree of liveliness of the baseballs used that 
season or the average weather conditions for the year) affects the distribution 
of the mean number of home runs that each player will hit in the year. Thus, 
information about the number of home runs hit by one of the players yields 
probabilistic information about the value of Y that affects the distribution of 
the number of home runs of the other player. This type of model, where there 
is a common random variable (Y in this case) that affects the distributions of 
the conditional parameters of the random variables of interest, is known as an 


hierarchical Bayes model. o 


When applying the Gibbs sampler, it is not necessary to condition on all but one 
of the variables. If it is possible to generate from joint conditional distributions, 
then we may utilize them. For instance, suppose n = 3 and that we can generate 
from the conditional distribution of any two of them given the third. Then, at 
each iteration we could generate a random number U, set J = Int(3U +1), and 
generate from the joint distribution of X j: Xp J, k Æ 1, given the present value 
of X;. 
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Example 10i Let X;,i=1,2,3,4,5, be independent exponential random 
variables, with X; having mean i, and suppose we are interested in using simu- 
lation to estimate 


B= efix > 120 


=l 


ey, =13] 


i=] 


We can accomplish this by using the Gibbs sampler via a random choice of two 
of the coordinates. To begin, suppose that X and Y are independent exponentials 
with respective rates A and u, where u < A, and let us find the conditional 
distribution of X given that X +Y = a, as follows. 


Fuixsyl@) = Ci fx y(x a- x), O<x<a 
= Creer te), O<x<a 


= Ce Or: O<x<a 


which shows that the conditional distribution is that of an exponential with rate 
AÀ — p that is conditioned to be less than a. 

Using this result, we can estimate £ by letting the initial state (x,, x2, x3, X4, X5) 
be any five positive numbers that sum to 15. Now randomly choose two elements 
from the set 1, 2, 3, 4, 5; say J =2 and J =5 are chosen. Then the conditional 
distribution of X,, X; given the other values is the conditional distribution of 
two independent exponentials with means 2 and 5, given that their sum is 
15—x, — x; — x4. But, by the preceding, the values of X, and X, can be obtained 
by generating the value of an exponential with rate + — i = 4 that is conditioned 
to be less than 15 — x, — x; — x4, then setting x, equal to that value and resetting 
x; to make Sii x; = 15. This process should be continually repeated, and the 
proportion of state vectors x having Ik. 1 X; > 120 is the estimate of £. o 


Example 10j Suppose that n independent trials are performed; each of 
which results in one of the outcomes 1,2,...,7, with respective probabilities 
Pis Pos» +++ Pro Dimi Pi = 1, and let X; denote the number of trials that result in 
outcome i. The random variables X,,..., X,, whose joint distribution is called 
the multinomial distribution, were introduced in Example 4g where it was shown 
how they can be simulated. Now suppose n > r, and that we want to simulate 
X,,...,X, conditional on the event that they are all positive. That is, we want to 
simulate the result of the trials conditional on the event that each outcome occurs 
at least once. How can this be efficiently accomplished when this conditioning 
event has a very small probability? 


Solution To begin, it should be noted that it would be wrong to suppose that 
we could just generate the result of n — r of these trials, and then let X; equal 1 
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plus the number of these n —r trials that result in outcome i. (That is, attempting 
to put aside the r trials in which all outcomes occur once, and then simulating 
the remaining n —r trials does not work.) To see why, let n = 4 and r = 2. Then, 
under the putting aside method, the probability that exactly 2 of the trials would 


result in outcome 1 is 2p(1 — p), where p = p,. However, for the multinomial 
random variables X,, X, 


P{X, = 2} 
P{X, > 0, X, > 0} 
B P{X, =2} 
~ 1-P{X, =4}—P{X, =4} 


s Orap 
Lp (1 —p)* 


As the preceding is not equal to 2p(1 — p)(tryp = 1/2), the method does not 
work. 

We can use the Gibbs sampler to generate a Markov chain having the appro- 
priate limiting probabilities. Let the initial state be any arbitrary vector of r 
positive integers whose sum is n, and let the states change in the following 
manner. Whenever the state is x,,...,x,, generate the next state by first ran- 
domly choosing two of the indices from 1,...,r. If i and j are chosen, let 
S=x,+x;, and simulate X, and X; from their conditional distribution given that 
X; = X, k Æ i, j. Because conditional on X; = Xp, k Æ i, j there are a total of 
s trials that result in either outcome i or j, it follows that the number of these 
trials that result in outcome i is distributed as a binomial random variable with 
parameters (3, 5 rrr that is conditioned to be one of the values 1,...,5—1. 
Consequently, the. ‘discrete inverse transform method can be used to amati 
such a random variable; if its value is v, then the next state is the same as n 


P{X, =2|X, > 0, X, > 0} = 


-previous one with the exception that the new values of x, and x; are v and s— 


Continuing on in this manner results in a sequence of states whose limiting 
distribution is that of the multinomial conditional on the event that all outcomes 
occur at least once. QO 


Remarks 


1. The same argument can be used to verify that we obtain the appropriate 
limiting mass function when we consider the coordinates in sequence and 
apply the Gibbs sampler (as in Example 10i), or when we use it via con- 
ditioning on less than all but one of the values (as in Example 10j). These 
results are-proven by noticing that if one chooses the initial state according 
to the mass function f, then, in either case, the next state also has mass 
function f. But this shows that f satisfies the equations (10.1), implying by 
uniqueness that f is the limiting mass function. 
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2. Suppose you are using the Gibbs sampler to estimate E[X;] in a situation 
where the conditional means E[X;|X;,j # i] are easily computed. Then, 
rather than using the average of the successive values of X; as the estimator, 
it is usually better to use the average of the conditional expectations. That 
is, if the present state is x, then take E[X,|X; = x,, j # i] rather than x; 
as the estimate from that iteration. Similarly, if you are trying to estimate 
P{X; = x}, and P{X; = x|X;, j # i} is easily computed, then the average of 
these quantities is usually a better estimator than is the proportion of time 
in which the ith component of the state vector equals x. 

3. The Gibbs sampler shows that knowledge of all the conditional distributions 
of X; given the values of the other X;, j Æ i, determines the joint distribution 
of X. o 


10.4 Simulated Annealing 


Let A be a finite set of vectors and let V(x) be a nonnegative function defined 
on x € A, and suppose that we are interested in finding its maximal value and 
at least one argument at which the maximal value is attained. That is, letting 


V = max V(x) 
and 
M = {xeA: V(x)=V"} 


we are interested in finding V* as well as an element in M. We will now show 
how this can be accomplished by using the methods of this chapter. 

To begin, let A > 0 and consider the following probability mass function on 
the set of values in A: 


eY) 
ee ee) 


By multiplying the numerator and denominator of the preceding by e~*””, and 
letting |M | denote the number of elements in M, we see that 


PE) = 


eXV@)-V") 
|M + Zren AMON) 
However, since V(x) — V* < 0 for x ¢ M, we obtain that as A > œ, 


5(x, M) 
|| 
where 6(x, M) = 1 if x € M and is 0 otherwise. 


P) = 


P(x) > 
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Hence, if we let A be large and generate a Markov chain whose limiting 
distribution is p,(x), then most of the mass of this limiting distribution will 
be concentrated on points in M. An approach that is often useful in defining 
such a chain is to introduce the concept of neighboring vectors and then use a 
Hastings—Metropolis algorithm. For instance, we could say that the two vectors 
x € A and y € A are neighbors if they differ in only a single coordinate or if 
one can be obtained from the other by interchanging two of its components. 
We could then let the target next state from x be equally likely to be any of its 
neighbors, and if the neighbor y is chosen, then the next state becomes y with 
probability 


eO /|N(y)| 
[Sone 


or remains x otherwise, where |N(z)| is the number of neighbors of z. If each 
vector has the same number of neighbors (and if not already so, this can almost 
always be arranged by increasing the state space and letting the V value of any 
new state equal 0), then when the state is x, one of its neighbors, say y, is 
randomly chosen; if V(y) > V(x), then the chain moves to state y, and if V(y) < 
V(x), then the chain moves to state y with probability exp{A(V(y) — V(x))} or 
remains in state x otherwise. 

One weakness with the preceding algorithm is that because A was chosen to 
be large, if the chain enters a state x whose V value is greater than that of 
each of its neighbors, then it might take a long time for the chain to move to 
a different state. That is, whereas a large value of A is needed for the limiting 
distribution to put most of its weight on points in M, such a value typically 
requires a very large number of transitions before the limiting distribution is 
approached. A second weakness is that since there are only a finite number 
of possible values of x, the whole concept of convergence seems meaningless 
since we could always, in theory, just try each of the possible values and so 
obtain convergence in a finite number of steps. Thus, rather than considering the 
preceding from a strictly mathematical point of view, it makes more sense to 
regard it as a heuristic approach, and in doing so it has been found to be useful 
to allow the value of A to change with time. 

A popular variation of the preceding, known as simulated annealing, operates 
as follows. If the nth state of the Markov chain is x, then a neighboring value is 
randomly selected. If it is y, then the next state is either y with probability 


, exp{A, V(y)}/IN(y)| 
= f. expla VINE | 


or it remains x, where À„, n > 1, is a prescribed set of values that start out small 
(thus resulting in a large number of changes in state) and then grow. 

A computationally useful choice of A, (and a choice that mathematically 
results in convergence) is to let A, = Clog(1 +n), where C > 0 is any fixed 
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positive constant (see Besag et al., 1995; Diaconis and Holmes, 1995). If we 
then generate m successive states X,,...,X,,, we can then estimate V* by 
max;_1,..._ V(X;), and if the maximum occurs at X» then this is taken as an 
estimated point in M. 


Example 10k The Traveling Salesman Problem One version of 
the traveling salesman problem is for the salesman to start at city 0 and then 
sequentially visit all of the cities 1, . . . , r. A possible choice is then a permutation 
X,...,%, of 1,...,7 with the interpretation that from 0 the salesman goes to 
city x,, then to x,, and so on. If we suppose that a nonnegative reward v(i, j) is 
_ earned whenever the salesman goes directly from city i to city j, then the return 
of the choice x = (x,,...,,) is 


Vx) = L v(x; x;) where x» = 0 


i=l 


By letting two permutations be neighbors if one results from an interchange 
of two of the coordinates of the other, we can use simulated annealing to 
approximate the best path. Namely, start with any permutation x and let X} =x. 
Now, once the nth state (that is, permutation) has been determined, n > 0, then 
generate one of its neighbors at random [by choosing J, J equally likely to be any 
of the (5) values i Æ j,i, j= 1, . . . , r and then interchanging the values of the Jth 
and Jth elements of X,,]. Let the generated neighbor be y. Then if V(y) > V(X,,), 
set X,,; =y. Otherwise, set X,,; = y with probability (1-+n)“-"»), or set 
it equal to X, otherwise. [Note that we are using A, =log(1+-7).] o 


10.5 The Sampling Importance Resampling Algorithm 


The sampling importance resampling, or SIR, algorithm is a method for gener- 
ating a random vector X whose mass function 


fœ) = Cf) 


is specified up to a multiplicative constant by simulating a Markov chain whose 
limiting probabilities are given by a mass function 


g(x) = Cog.(*) 


that is also specified up to a multiplicative constant. It is similar to the 
acceptance-rejection technique, where one starts by generating the value of a 
random vector Y with density g and then, if Y = y, accepting this value with 
probability f(y)/cg(y), where c is a constant chosen so that f(x)/cg(x) < 1, 
for all x. If the value is not accepted, then the process begins anew, and the 
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eventually accepted value X has density f. However, as f and g are no longer 
totally specified, this approach is not available. | 

The SIR approach starts by generating m successive states of a Markov chain 
whose limiting probability mass function is g. Let these state values be denoted 
as Yis.. Ym. Now, define the “weights” w,,i==1,...,m, by 


w; = fa0i) 
80) 
and generate a random vector X such that 
wW: é 
P(X =y} = sa j=l,...,m 


i=1 Mi 


We will show that when m is large, the random vector X has a mass function 
approximately equal to f. 


Proposition The distribution of the vector X obtained by the SIR method 
converges as m —> œ to f. 


Proof Let Y,,i=1,...,m, denote the m random vectors generated by the 
Markov chain whose limiting mass function is g, and let W, = f,(Y;)/g,(Y;) 
denote their weights. For a fixed set of vectors A, let I; = 1 if Y; € A and let it 
equal 0 otherwise. Then 


Shaw, | 


P{X € AIY,,i=1,...,m}= SW (10.5) | 
i=l "i 


-Now, by the Markov chain result of Equation (10.2), we see that as m —> ov, 


3 1,W,/m > E,W] = E, HWI = 1]P,{1 = 1} = £, [WY € AJP, {Y € A} 


i=] 


and 


E W/m > BLM = EL /e0)1 f ER eoay = c,/c, 


0) 
i=l 8&0) 


Hence, dividing numerator and denominator of (10.5) by m shows that 


; C 
P{X € AlY,,i=1,...,m}—> g EIWIY E AJP {Y € A} 
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But, 


f) 
oY) 


—~ g(y)dy 


C 
-1E [WY € AJP, {Y € A} = cr f 


fO) 
ea 80) 


= ii _, foray 


|Y € A P {Y € A} 


Hence, as m —> oo, 
P{X € AlY,,i=1,... sm} f fO)dy 
yEA 


which implies, by a mathematical result known as Lebesgue’s dominated con- 
vergence theorem, that 


PIX € A} = E[P{X € AIY,,i=1,...,m}]> f fy) dy 
YEA 
and the result is proved. oo 


The sampling importance resampling algorithm for approximately generating 
a random vector with mass function f starts by generating random variables with 
a different joint mass function (as in importance sampling) and then resamples 
from this pool of generated values to obtain the random vector. 

Suppose now that we want to estimate E,[h(X)] for some function h. This 
can be accomplished by first generating a large number of successive states of 
a Markov chain whose limiting probabilities are given by g. If these states are 
Yis- - -Ym then it might seem natural to choose k vectors X,,...,X, having 
the probability distribution 


Ww; 
P{X =y} = r. j=l,...,m 


i=l W; 


where k/m is small and w; = f,(y;)/g,(y;), and then use X$; h(X;)/k as the 
estimator. However, a better approach is not to base the estimator on a sampled 
set of k values, but rather to use the entire set of m generated values y,,...5¥;: 
We now show that 


m 


pa, wih) 


il Ww; j=i 


is a better estimator of E,[h(X)] than is E h(X;)/k. To show this, note that 


E[hA(X,) 1. +6 Ym = eu; hO; i) 


ae i j=l 
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and thus 


14 
E| EAD 9m |= a Se bO) 


i j=l 


which shows that >", h(y;)w;/ Èi w; has the same mean and smaller variance 
than Di h(X;)/k. 

The use of data generated from one distribution to gather information about 
another distribution is particularly useful in Bayesian statistics. 


Example 101 Suppose that X is a random vector whose probability dis- 
tribution is specified up to a vector of unknown parameters @. For instance, X 
could be a sequence of independent and identically distributed normal random 
variables and @ = (6,, @,) where 0, is the mean and @, is the variance of these 
random variables. Let f(x|@) denote the density of X given 0. Whereas in clas- 
sical statistics one assumes that @ is a vector of unknown constants, in Bayesian 
statistics we suppose that it, too, is random and has a specified probability density 
function p(@), called the prior density. 

If X is observed to equal x, then the conditional, also known as the posterior, 
density of 0 is given by 


fœl6)p(0) 
S FŒœl0)p(0)a(0) 


However, in many situations f f(x|@)p(@)d(@) cannot easily be computed, and 
so the preceding formula cannot be directly used to study the posterior distribu- 
tion. 

One approach to study the properties of the posterior distribution is to start by 
generating random vectors @ from the prior density p and then use the resulting 
data to gather information about the posterior density p(@|x). If we suppose 
that the prior density p(@) is completely specified and can be directly generated 
from, then we can use the SIR algorithm with 


F.(8) = fŒœl0)p(0) 
8(0) = g,(9) = p(0) 
w(0) = f(x|8) 
To begin, generate a large number m of random vectors from the prior density 


p(@). Let their values be 6,,...,0,,- We can now estimate any function of the 
form E[h(@)|x] by the estimator 


p(O|x) = 


F(x|0;) 


a,h(0,;), where a, = =z 
ee i= SE, Jel) 
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For instance, for any set A we would use 


J a,l{0;€ A} to estimate P{@ € Alx} 


j=l 


where J{0; € A} is 1 if 0; € A and is 0 otherwise. 

In cases where the dimension of @ is small, we can use the generated data 
from the prior along with their weights to graphically explore the posterior. For 
instance, if 0 is two-dimensional, then we can plot the prior generated values 
6,,...,0,, on a two-dimensional graph in a manner that takes the weights of 
these points into account. For instance, we could center a dot on each of these m 
points, with the area of the dot on the point 0; being proportional to its weight 
f(x|0;). Another possibility would be to let all the dots be of the same size but 
to let the darkness of the dot depend on its weight in a linear additive fashion. 
That is, for instance, if m= 3 and 0, = 0,, f(x|@,) = 2/(x|6,), then the colors 
of the dots at 0, and @; should be the same. 

If the prior density p is only specified up to a constant, or if it is hard to 
directly generate random vectors from it, then we can generate a Markov chain 
having p as the limiting density, and then continue as before. o 


Remark The estimator of E[h(@)|x] presented in Example 101 could also 
have been derived by an importance sampling type argument. Since 


plon) = 22102) 


where 
C= f fæl) p(0)40 = ELFO) 


we have that 


E[h(6) x] = f mOr 
= g| OOW 
=| 26) | 
_ ELŒœl0)h(0)] 
C 
s Elfe) 
E[fŒl0)] 


p(@)d0 
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But the preceding suggests the estimator 


Del f&|O;)A(6;) 
i1 S(%|9;) 
which is precisely the one given in Example 101. oO 
Exercises 
1. Let Tp, j=l1,..., N, denote the stationary probabilities of a Markov chain. 


Show that if P{X = j} = T; j=1,...,N, then 
P{X,=j}=7;, foral n, j 


2. Let Q be a symmetric transition probability matrix, that is, q;; = q; for all i, 
j. Consider a Markov chain which, when the present state is i, generates the value 
of a random variable X such that P{X = j} = q;;, and if X = j, then either moves 
to state j with probability b,/(b;+ b;), or remains in state i otherwise, where 
b;, j=l1...,N, are specified positive numbers. Show that the resulting Markov 
chain is time reversible with limiting probabilities T= Cb j> j=l,...,N. 


3. Let S be the set of all n x n matrices A whose elements are either 0 or 1.. 


(Thus, there are 2” matrices in S.) The pair of elements a;j and a,, of the 
matrix A are said to be neighbors if |r —i]-+|s—j| = 1. (Thus, for instance, 
the neighbors of a, are 4,2, 42,1, 43, and a3.) Let N denote all the pairs of 
neighboring elements of A. The “Ising energy” of the matrix A is defined by 


H(A) i D ai jir,s 
N 


where the sum is over all the pairs of neighboring elements. Give a method for 
randomly choosing such a matrix A according to the probability mass function 


exp{—AH(A)} 
Daes CXP{—AH(A)}’ 
where A is a specified positive constant. 


[Hint: Let the matrices A and B be neighbors if A— B has only one nonzero 
element. | 


P(A) = AES 


4. Consider a system of 20 independent components, with component i being 
functional with probability 0.5+i/50,i=1,...,20. Let X denote the number 
of functional components. Use simulation to estimate the conditional probability 
mass function P{X = i|X <5},i=1,2,3,4,5. 
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5. Suppose that the random variables X and Y both take on values in the 
interval (0, B). Suppose that the joint density of X given that Y = y is 


f(aly)=CO)e", O<x<B 
and the joint density of Y given that X = x is 
fylx) = C(x),  O<y<B 
Give a method for approximately simulating the vector X, Y. Run a simulation 


to estimate (a) E[X] and (b) E[XY]. 


6. Give an efficient method for generating nine uniform points on (0, 1) con- 
ditional on the event than no two of them are within 0.1 of each other. (It can 
be shown that if n points are independent and uniformly distributed on (0, 1), 
then the probability that no two of them are within d of each other is, for 
0<d<1/(n—1),[1-—(—1)d]".) 


7. In Example 10d, it can be shown that the limiting mass function of the 


number of customers at the m-+1 servers is 


m+1 m+i 


. PCs- -+ s Mims msi) = C [[ P;(n,), Pna=r 


i=] i=1 


where for each i= 1,...,m+1,P,(n),n=0,...,7r, is a probability mass 
function. Let e, be the m +1 component vector with a 1 in the kth position and 
zeros elsewhere. For a vector n = (74, . - - , %m41), let 


I(n; > 0) 


q(n, n— e; +e;) (m+ 1) Ee I(n; = 0) 

In words, q is the transition probability matrix of a Markov chain that at each 
step randomly selects a nonempty server and then sends one of its customers to 
a randomly chosen server. Using this q function, give the Hastings—Metropolis 
algorithm for generating a Markov chain having p(n, ..-,m»”m41) as its 
limiting mass function. 


8. Let X;,i=1,2,3, be independent exponentials with mean 1. Run a simula- 
tion study to estimate 


(a) E[X, +2X%>+3X3|X, +2X, +3X; > 15]. 
(b) E[X, +2X, +3X,|X, +2X,+3X, <1]. 


9. A random selection of m balls is to be made from an urn that contains 7 
balls, n; of which have color type i=1,...,7, Ð; n; =n. Let X; denote the 
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number of withdrawn balls that have color type i. Give an efficient procedure 
for simulating X,,...,X, conditional on the event that all r color types are 
represented in the random selection. Assume that the probability that all color 
types are represented in the selection is a small positive number. 


10. Suppose the joint density of X, Y, Z is given by 
f(x,y,z) = Career), x>0, y>0, z>0 


where a, b, c are specified nonnegative constants, and C does not depend on x, 
y, z. Explain how we can simulate the vector X, Y, Z, and run a simulation to 
estimate E[XYZ] when a=b=c=1. 


11. Suppose that for random variables X, Y, N 
P{X =i,y<¥ <y+dy,N=n} 
; : À” 
x e) yee (1 RN y) edy 
i n! 


where i = 0,...,n,n =0,1,...,y > 0, and where a, B, A are specified 
constants. Run a simulation to estimate E[X], E[Y], and E[N] when a =2, 
B=3,A=4. 


12. Use the SIR algorithm to generate a permutation of 1,2,..., 100 whose 
distribution is approximately that of a random permutation X,,..., Xjo9 condi- 
tioned on the event that }); jX; > 285,000. 


13. Let X', X?,...,X" be random points in Z, the circle of radius 1 centered 
at the origin. Suppose that for some r, 0 < r < 1, their joint density function is 
given by 


f(%,..-,X,) =K exp{—Br(r:x,,...,x,)}, X El, i=1,...,7 


where t(r: X,,...,X,) is the number of the (3) pairs of points x;,x,,i Æ j, 
that are within a distance r of each other, and 0 < 8 < oo. (Note that B = œœ 
corresponds to the case where the X! are uniformly distributed on the circle 
subject to the constraint that no two points are within a distance r of each other.) 
Explain how you can use the SIR algorithm to approximately generate these 
random points. If r and 8 were both large, would this be an efficient algorithm? 


14. Generate 100 random numbers Uo: k = 1,...,10, U; ji ~jij= 
1,..., 10. Now, consider a traveling salesman problem in which the salesman 
starts at city 0 and must travel in turn to each of the 10 cities 1,..., 10 according 
to some permutation of 1,..., 10. Let U;; be the reward earned by the sales- 
man when he goes directly from city i to city j. Use simulated annealing to 
approximate the maximal possible return of the salesman. 
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Some Additional Topics 


Introduction 


In this chapter we present some additional topics that are somewhat more spe- 
cialized than those considered in earlier chapters. In Section 11.1 we present a 
highly efficient technique, called the alias method, for generating discrete random 
variables. In Section 11.2 we define and then present a method for simulating 
a two-dimensional Poisson process. In Section 11.3 we present a probability 
identity concerning the sum of Bernoulli random variables and show how its 
use can lead to dramatic variance reductions when estimating small probabili- 
ties. Finally, Section 11.4 considers how to efficiently estimate probabilities and 


-expectations relating to first passage times of Markov processes. 


11.1 The Alias Method for Generating Discrete 
Random Variables 


In this section we study a technique for generating discrete random variables 
which, although requiring some setup time, is very fast to implement. 

In what follows, the quantities P,P, Q®, k <n—1, represent probability 
mass functions on the integers 1,2, . . . , n—that is, they are n-vectors of nonneg- 
ative numbers summing to 1. In addition, the vector P has at most k nonzero 
components, and each of the Q® has at most two nonzero components. We show 
that any probability mass function P can be represented as an equally weighted 
mixture of n — 1 probability mass functions Q (each having at most two nonzero 
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components). That is, we show, for suitably defined Q™,..., Q@-, that P can 
be expressed as 


1 
ze (k) 
P=—— “50 (11.1) 


As a prelude to presenting the method for obtaining this representation, we need 
the following simple lemma whose proof is left as an exercise. 


Lemma Let P= {P,,i=1,...,n} denote a probability mass function. Then ` 


(a) there exists an i,1<i-<n, such that P; < 1/(n— 1), and 
(b) for this i there exists aj, j #i, such that P,+ P; > 1/(n—1). 


Before presenting the general technique for obtaining the representation (11.1), 
let us illustrate it by an example. 


exemp’ 1 la Consider the three-point distribution P with P, = 

P,= 35, P, = 3. We start by choosing i and j “tist ls the conditions of the 
preceding lemma. Since P; < 5 + and P} +P, > i, we can work with i= 3 and 
j =2. We now define a two- pont mass function Q®, putting all its weight on 
3 and 2 and such that P is expressible as an equally weighted mixture between 
Q® and a second two-point mass function Q®. In addition, all the mass of 
point 3 is contained in Q®. As we have 


1 , 
= 5 (0) +03”), j=1,2,3 (11.2) 
and oP is supposed to equal 0, we must therefore take 


1 


7 
oP =2P, == 


0p =1-0P =z, Oi = 
To satisfy (10.2), we must then set 
7 1 7 
0P =0, OP =2P,-z=7, OP =2P,=5 


Hence we have the desired representation in this case. Suppose now that the 
original distribution was the following four-point mass function: 


7 1 1 3 
P = —, P,=-, Pz ==, P == 
1 16 2 4 a t 16 
Now P} < i and P, + P, > ;. Hence our initial two-point mass function— 


Q® — concentrates on points 3 and 1 (giving no weight to 2 and 4). Because the 


11.1 The Alias Method for Generating Discrete Random Variables 275 


final representation gives weight + to Q“ and in addition the other QO, j = 2, 3, 
do not give any mass to the value 3, we must have that 


_1 
(1) a = 
zo ~ 8 


Hence 


Oot tr 


3 1 3 
0P =; OQ) =1-F= 
Also, we can write 
1 2 
P = —Q® + ZPO) 
32 = 3 


where P®), to satisfy the above, must be the vector 


3 3 1 11 
PP = 3 (P30. => 


32 
3, 3 
Py) = re 
PY) = 0 
3 9 
p® te pa, 
sme Tee) 


Note that P® gives no mass to the value 3. We can now express the mass 
function P®) as an equally weighted mixture of two-point mass functions Q@ 


“and Q®), and we end up with 


1 
P= 3” ie ae Q? += 50°) 
1 
= (Q® + Q2 ag Q®) 
(We leave it as an exercise for the reader to fill in the details.) o 


The above example outlines the following general procedure for writing the 
n-point mass function P in the form (11.1), where each of the Q® are mass 
functions giving all their mass to at most two points. To start, we choose i and 
j satisfying the conditions of the lemma. We now define the mass function Q 
concentrating on the points i and j and which contain all the mass for point i by 
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noting that in the representation (11.1) QP =0 fork=2,...,n—1, implying 
that 
QM =(n—1)P, and so Q® =1—(n—-1)P, 
Writing 
1 n—2 
P = —— QO + — pe-) . 
ea i ET (11.3) 


where P“~ represents the remaining mass, we see that 


pe) =0 

(n—1) n—l1 1 0) n—] 1 
PY’ = —— | P,— 9 ) = ——— | P+. P, — — 
4 =G noe n—2 read n—-1 
j n—1 

pe = zPe k#iorj 


That the above is indeed a probability mass function is easily checked—for 
example, the nonnegativity of pe» follows from the fact that j was chosen so 
that P; +P, > 1/(n— 1). 
We may now repeat the above procedure on the (n— 1) point probability mass 
function P“~ to obtain 
oy! ga, 273 pa» 
We ge gee 


and thus from (11.3) we have 


pos? QO 4 wth Q2 + 23 por 
n—1 n—l1 n—1 


We now repeat the procedure on P“~?) and so on until we finally obtain 


1 
P= —— (9M 4...4.Q0-) 
ee ene) 


In this way we are able to represent P as an equally weighted mixture of n— 1 
two-point mass functions. We can now easily simulate from P by first generating 
a random integer N equally likely to be either 1,2,...,n—1. If the resulting 
value N is such that Q puts positive weight only on the points iy and jy, we 
can set X equal to i, if a second random number is less than of” and equal 
to jy otherwise. The random variable X will have probability mass function P. 
That is, we have the following procedure for simulating from P. 
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STEP 1: Generate U, and set N=1+Int[(n—1)U)]. 
STEP 2: Generate U, and set 


jy otherwise 


- Remarks 


1. The above is called the alias method because by a renumbering of the Q’s 
we can always arrange things so that for each k, of > 0. (That is, we can 
arrange things so that the kth two-point mass function gives positive weight 
to the value k.) Hence, the procedure calls for simulating N, equally likely 
to be 1,2,...,n—1, and then if N = k it either accepts k as the value of 
X, or it accepts for the value of X the “alias” of k (namely, the other value 
that Q® gives positive weight). 

2. Actually, it is not necessary to generate a new random number in Step 2. 
Because N — 1 is the integer part of (n —1)U,, it follows that the remainder 
(n—1)U, — (N — 1) is independent of N, and is uniformly distributed on 
(0, 1). Hence, rather than generating a new random number U, in Step 2, 
we can use (n — 1)U, — (N — 1). o 


11.2 Simulating a Two-Dimensional Poisson Process 


A process consisting of randomly occurring points in the plane is said to consti- 
tute a two-dimensional Poisson process having rate A, A > 0, if 


1. The number of points occurring in any given region of area A is Poisson 
distributed with mean AA. 
2. The numbers of points occurring in disjoint regions are independent. 


For a given fixed point 0 in the plane, we now show how to simulate points, 
according to a two-dimensional Poisson process with rate A, that occur in a 
circular region of radius r centered at 0. 

Let C(a) denote the circle of radius a centered at 0, and note that, from 
Condition 1, the number of points in C(a) is Poisson distributed with mean 
Ama’. Let R;,i > 1, denote the distance from the origin 0 to its ith nearest point 
(Figure 11.1). Then 


P {TR} > x} =P{R, > Vx/7} 


E S) 


Sr ew 
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Figure 11.1. Two Dimensional Poisson Process. 


where the last equality uses the fact that the area of C(./x/7r) is x. Also, with 
C(b) — C(a) denoting the region between C(b) and C(a), a < b, we have 


P {aR} — Ri > x|R, =a} 


=P lao points in C ( ze) —C(a)|R, = al 


2 
=P fro points in C (y ize — aco} by Condition 2 


= ew 
In fact, the same argument can be repeated continually to obtain the following 
proposition. 


Proposition With Rì = 0, mR? — wR7_,,i> 1, are independent exponential 
random variables each having rate À. 


In other words, the amount of area that need be traversed to encounter a 
Poisson point is exponential with rate A. Since, by symmetry, the respective 
angles of the Poisson points are independent and uniformly distributed over 
(0, 27r), we thus have the following algorithm for simulating the Poisson process 
over a circular region of radius r about 0. 
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STEP 1: Generate independent exponentials with rate A, X,,X»,..., stop- 
ping at 


N =Min{n: X,+---+X, > nr} 


STEP 2: If N = 1 stop; there are no points in C(r). Otherwise, for 
i=l1,...,N—1, set 


R= [X +. +X, 
T 


(that is, mR? = X,+---+X;j). 
STEP 3: Generate random numbers U}, ..., Uy_}. 
STEP 4: The polar coordinates of the N — 1 Poisson points are 


(Ra 2TU), i=1,...,N—1 


The above algorithm can be considered as the fanning out from a circle 
centered at 0 with a radius that expands continuously from 0 to r. The successive 
radii at which points are encountered are simulated by using the result that 
the additional area necessary to explore until one encounters another point is 
always exponentially distributed with rate A. This fanning-out technique can 
also be used to simulate the process over noncircular regions. For example, 
consider a nonnegative function f(x) and suppose that we are interested in 
simulating the Poisson process in the region between the x-axis and the function 
f (Figure 11.2) with x going from 0 to T. To do so, we can start at the left- 
hand edge and fan vertically to the right by considering the successive areas 
encountered. Specifically, if X, < X, <--- denote the successive projections of 
the Poisson process points on the x-axis, it follows in exactly the same manner 
as before that (with X, = 0) 


Xi 
a f(x)dx, i=1,..., are independent exponentials with rate À 
Xi- 


Fœ 


Figure 11.2. Graph of f. 
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Hence, we can simulate the Poisson points by generating independent exponential 
random variables with rate A, W,, W3, . . . , stopping at 


T 
N=Min |n: W+ +W, > i} fada 
0 
We now determine X,,..., Xy_, by using the equations 
x, 
| feaz=, 


Xa 
f f(x)dx = W, 
Xx) 


Xy-1 
il S(x)dx = Wy, 
Xy-2 
Because the projection on the y-axis of the point whose x-coordinate is X; is 
clearly uniformly distributed over (0, f(X;)), it thus follows that if we now 
generate random numbers U,,..., Uy_;, then the simulated Poisson points are, 
in rectangular coordinates, (X,, U;f(X;)),i=1,...,N—1. 

The above procedure is most useful when f is regular enough so that the 
above equations can be efficiently solved for the values of X;. For example, if 
F(x) = c (and so the region is a rectangle), we can express X; as 


ie! Wte +W; 
~ c 


Xi 
and the Poisson points are 


(Xa cU), i=1,...,N—1 


11.3 Simulation Applications of an Identity for Sums 
of Bernoulli Random Variables 


Let X,,...,X,, be Bernoulli random variables such that 
P(X; = 1} =A; = 1— P{X; = 0}. 


Also, let S = ) (7, Xp and set A = E[S] = Xia À; Let R be an arbitrary random 
variable, and suppose that J, independent of R, X,,..., Xm, is such that 


P{[=i}=1/m, i=1,...,m 


That is, J is a discrete uniform random variable on 1, . . . , m that is independent 
of the other random variables. 


| 
$ 
l 
i 
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The following identity is the key to the results of this section. 
Proposition 


(a) PU =i|X,=1}=A,/A 
(b) E[SR] = AE[R|X, = 1] 


` (©) P{S> 0} =AE[}|X, = 1] 


Proof To prove (a), note that 


P(X, = 1|[ =i} PUI =i} 


PUSS ee Pe 


Now, 


P{X, =1| = i} = P{X, = 1H = i} 
= P{X,;=1} by independence 
=A, 


which completes the proof of (a). To prove (b), reason as follows: 


E[SR] =E |z > x| 
= THRxI 
= DIRX, = 1]A,; + E[RX;|X; = 0](1—A,)} 
= TAHIRI, =1] (11.4) 
Also, 
E[RIX, = 1] = )E[RIX, = 1,1 = }P{U = iX; = 1} 
= LAIRI, =1,I=i]à;/À by (a) 


= DO ER|X; = 1];/A (11.5) 


Combining Equations (11.4) and (11.5) proves (b). 
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To prove (c), define R to equal 0 if S = 0 and to equal 1/5 if S > 0. Then, 
E[SR] = P{S > 0} and E[R|X, = =2[ 4 


1 
5 x= 1] 


and so (c) follows directly from (b). o 


We will now use the preceding proposition to estimate (a) the failure probability 
of a system, and (b) the probability that a specified pattern occurs within a given 
time frame. 


. Example 11b Consider the model of Example 8b, which is concerned with 
a system composed of n independent components, and suppose that we want 
to estimate the probability that the system is failed, when this probability is 
very small. Now, for any system of the type considered in Example 8b there 
will always be a unique family of sets {C}, . . . , Cm}, none of which is a subset 
of another, such that the system will be failed if and only if all the components of 
at least one of these sets are failed. These sets are called the minimal cut sets of 
the system. 

Let Y, j=1,...,n equal 1 if component j is failed and let it equal 0 
otherwise, and let q; = P{Y; = 1} denote the probability that component j is 
failed. Now, fori=1,...,m, let 


x= [1Y, 
je; 


That is, X; is the indicator for the event that all components in C; are failed. If 
we let S = )°, X; then 6, the probability that the system is failed, is given by 


6 = P{S > 0} 


We will now show how to make use of the proposition to efficiently estimate 0. 

First, let A; = E[X;] = Tjec, 4j; and let A = X; A;. Now, simulate the value 
of J, a random variable that is equal to i with probability A,;/A,i=1,...,m. 
[It follows from Part (a) of the proposition that J has the same distribution as 
the conditional distribution of 7, given that X, = 1.] Then set Y, equal to 1 for 
all i € C}, and simulate the value of all of the other Y;,i ¢ C,, by letting them 
equal 1 with probability q; and 0 otherwise. Let S* denote the resulting number 
of minimal cut sets that have all their components down, and note that S* > 1. 
From Part (c) of the proposition, it follows that A/S* is an unbiased estimator 
of 0. Since S* > 1, it also follows that 


O0<A/S* <A 


and so when A, the mean number of minimal cut sets that are down, is very 
small the estimator 1/S* will have a very small variance. 


nit sonatas AAAS Lau KASANA AAN 
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For instance, consider a 3-of-5 system that fails if at least 3 of the 5 components 
are failed, and suppose that each component independently fails with probability 
q. For this system, the minimal cut sets will be the (3) = 10 subsets of size 3. 
Since all the component failures are the same, the value of I will play no role. 
Thus, the preceding estimate can be obtained by supposing that components 
1, 2, and 3 are all failed and then generating the status of the other two. Thus, by 


considering the number of components 4 and 5 that are failed, it follows since 


A = 10q? that the distribution of the estimator is 


P{A/S* = 10q?} = (1— 4)” 
P{A/S* = 10q?/4} =24(1 — 4) 
PaaS = =t 
Hence, with p =1— 4q, 
Var(A/S*) = E[(A/S*)?] — (E[A/S*])? 
= 100q[p? + pq/8-+4°/100 — (p? + pq/2+q’/10)"] 


The following table gives the value of 6 and the ratio of Var(R) to the variance 
of the estimator A/S* for a variety of values of q, where Var(R) = 6(1 — 0) is 
the variance of the raw simulation estimator. 


meeen 


q 7] Var(R)/Var(A/S*) 
0.001 9.985 x 107° 8.896 x 10!° 

0.01 9.851 x 1076 8,958,905 

0.1 0.00856 957.72 

0.2 0.05792 62.59 

0.3 0.16308 12.29 


aae 


Thus, for small q, Var(A/S*) is roughly of the order 6°, whereas Var(R)=~0. O 


Example 11c Waiting for a Pattern Let Y,, i> 1, be a sequence of 
independent and identically distributed discrete random variables with probability 
mass function P; = P{Y; = j}. Let i,,...,% be a fixed sequence of possible 
values of these random variables and define 


N =min{i: i> k, Y j= ip j=0,1,...,k—1} 


That is, N is the first time the pattern i,,...,i, occurs. We are interested in 
using simulation to estimate 6 = P{N < n}, in cases where 6 is small. Whereas 
the usual simulation estimator is obtained by simulating the sequence of random 
variables until either the pattern occurs or it is no longer possible for it to occur 
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by time n (and letting the estimator for that run be 1 in the former case and 0 in 
the latter), we will show how the preceding proposition can be applied to obtain 
a more efficient simulation estimator. 

To begin, let 


X;=1 if Y; = ip Yip Shy. Y Hh 


and let it be 0 otherwise. In other words, X; is equal to 1 if the pattern occurs 
(not necessarily for the first time) at time i. Let 


S=) X; 


i=k 
denote the number of times the pattern has occurred by time n and note that 
6= P{N <n} = P{S> 0} 
Since, fork <i<n 
A; = P{X; = 1} = P, Pa -P =p 


it follows from the proposition that 


o= (n-k+ pE] $ 


1 
5 X% =l 


where J, independent of the Y;, is equally likely to be any of the values k,..., n. 
Thus, we can estimate 0 by first simulating J, equally likely to be any of the 
values k,...,n, and setting 


Y; = iy, Yj = hase. Pj =i 
We then simulate the other n — k values Y, according to the mass function P and 


let S* denote the number of times the pattern occurs. The simulation estimator 
of @ from this run is 


(n—k+1)p 
S* 


ĝ = 


iy small values of (n —k+-1)p, the preceding will be a very efficient estimator 
of 0. 
i o 
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11.4 Estimating the Distribution and the Mean of the First 
Passage Time of a Markov Chain 


A Markov chain whose states are the nonnegative integers describes a process 
that moves from state to state in the following manner: whenever it is in state 


_ i then, regardless of its previous states, the next state is j with probability P; ;, 


where P; ,,i> 0, j > 0, are specified nonnegative numbers such that 


dP? 


P= 1, i>0 
j 


Thus, if X, denotes the kth state of this process, then 
P{X r = j|X;, = i, Xpress Xo} =P; ; 
Let S denote a subset of states of the Markov chain, and suppose that for given 


initial state X, = 0 ¢ S, we are interested in the distribution of the first time that 
the Markov chain enters one of the states in S. Specifically, we want to estimate 


p=P{T>n} 
where 
T =min{k : X, € S} 
The raw simulation approach would first generate X,,..., X, in sequence. That 


is, it would generate X, from the mass function 
P{X, = j|Xp =O} =Poj, F209 


If the generated value of X, is i, then X, would be generated from the mass 
function 


P{X, = |X, =i} =P; f20 
and so on. It would then let 


l= 1, if X;¢S for alli=l,...,n 
~ )0, otherwise 


This would then be repeated many times, and the average value of J obtained 
over these runs would be the raw simulation estimator of p. 
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To obtain an unbiased estimator having a smaller variance than /, let, for i ¢ S, 


a(i) = 2 Pir 


kgs 


be the probability that the next state from i is not in S. Conditioning on whether 
X, E€ S gives 


E[N = EUX, € S](1 — a(0)) +EŢ|X, ¢ S]a(0) 
= EUX, ¢ S]a(0) 


- Thus, from stratified sampling results we can conclude that an improved estimator 
is obtained by generating the initial state according to the conditional distribution 


of X, given that X, ¢ S. That is, if we generate the value of X, from the mass 
function 


PX, = IX =0,%, g=, jigs (11.6 
Fres Po, 

and then generate the succeeding states according to the Markov chain tran- 
sition probabilities, we obtain the improved estimator Ia(0). However, rather 
than generating the succeeding states according to the Markov chain transition 
probabilities, we can further reduce the variance of the estimator by continually 
applying the preceding argument. That is, if the value of X, obtained from the 
mass function (11.6) is k, then we can generate X, from its conditional proba- 


bility mass function given that it is not in S; that is, we generate X, from the 
mass function 


Pj 
$ 
Be s Pkr 


Continuing in this manner leads to the estimator, call it p, 


P{X, = j|X, =k, X, g S} = Jgs 


p= IT a(X;_1) 
i=l 


where X;,i > 0, is generated conditional on the value of X,_, and the fact that 
Xi gs. 


Example 11d: Estimating Multivariate Tail Probabilities Let 
Y,,...,¥, be random variables having a specified joint distribution. Suppose 
that, for each i= 1,...,n, the conditional distribution of Y, given the values of 
Y,,..., Y; is both computable and easily simulated from. Now, suppose that 
we want to estimate 


© p=P{¥,>a,,...,¥,>a,} 


# 
i 
4 
Í 
3 
3 
$ 
i 


P N 


i 
: 
E 
; 
2 


11.4 Estimating the Distribution and the Mean 287 


for specified values a;,i=1,...,n. To estimate p, we can utilize the pre- 
ceding method by considering the Markov chain whose jth state is the vector 
(Y,,..-,Y;). We let 


S= {(Yis-- -3j j LA, Yk < Ap for some k < j} 
Thus, if T is the number of transitions that it takes until a state in S occurs, then 
T> ù4 Y >an.. Y >an 
Thus we can estimate p as follows: 
1. Generate Y, conditional on Y, > a,; call the generated value y,. 


2. For i=2 to n—1, generate Y, conditional on Y, = yp, k =1,...,i-1, 
Y, > a;; call the generated value y;. The estimate of p from this run would be 


p=P{Y,> a} [ PCY; > a;lY; = y1; -+ -s Ya = yi} 
i=2 


As an illustration of this procedure, suppose that we want to estimate the tail 
probabilities of bivariate normal random variables, where the random variables 
Y, and Y, are said to have a bivariate normal distribution if their joint density 
function is 


jz 1 "ZA 1 (2 =e (2z) 
fO» Tamno iP 2(1 — p?) O; Oa 
-aprub ea |} 


0103 
. ; F ree 
It can be shown that Y, is normal with mean yw, and variance oj; Y, is normal 


with mean 4, and variance o$; and p is their correlation. Further, given that 
Y, = y,, the conditional distribution of Y, is normal with mean 


Gy 
Ho(y,) = EYY, = yl= Hote Or — By) 
1 
and variance 
oz) = Var(¥,|¥, =y) = o3(1 —p’) 
Thus, we can estimate 


p=P{Y, > a,, Y, > ay} 
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by simulating Y, conditional on it exceeding a,. (See Section 8.7 for an expla- 
nation of how to efficiently simulate a normal random variable conditional on 
its value exceeding some constant.) If its simulated value is y,, then the estimate 
of p from that run is 


b= P{Y, > a,}P{Y, > a| Y, = y1} 
-3(% e)a (2-0) 
oj Thy) 
late G(x) is the probability that a standard normal random variable is larger 
an x. 
Because PLY, > a,|Y, = yı} is monotone in y; (either increasing if p > 0, or 
decreasing if p < 0), we can use the generated value of Y, as a control variable. 


However, to do so, we must first compute its mean, E[Y,|Y, > a,]. This can be 
accomplished by noting that for a standard normal random variable Z 


1 oo 5 1 
E[Z|Z > a] = xe" Pdx = = etl 
V2 (a) i VIr ®(a) 
Thus, writing — 
Y, = H + oZ 
it follows that 
Y, >a 4 Z> BiH 
Tı 
Therefore, 
BUI > a] =E |m tozz > E] 
G1 
Oj — (age) /2 
=u + ev% o 
Jin p| ZE 
T 


Suppose now that we want to estimate E[T], the expected time until the Markov 
chain enters one of the states in S. To obtain E[T], we first show that E[T] can 
be written as the sum of the probabilities P{T > j}. 
To show this, let 7 f be the indicator of the event that T > j. That is, 
L= l, ifj<T 
d 0, otherwise 


l 


| 
i 
| 
: 
; 
3 
| 
i 
' 
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Because 
œ T-1 co 
Lis Lht+Lyat 
j=0 j=0 j=T 
we obtain, upon taking expectations, that 
E{T|=E Pa =) EU] = > P{T> j} 
j=0 j=0 j=0 
To estimate P{T > j}, choose a large number n, and simulate X,,...,X, in 


sequence, where, as in the preceding, X; is simulated conditional on the value 
of X;_, and on the event that X; ¢ S. After simulating these quantities, use 
the transition probabilities of the Markov chain to simulate X,4), X,42,---> 
stopping the run when you obtain one that is in S. (That is, X;, i > n, is simulated 
conditional on the generated value of X;_, but not on the event that X; ¢ S.) If 
Xm €S is the final value simulated, then e,, the estimate of P{T > j} from this 
run, is 


1, if j=0 
Tin @%1), if l<j<n 

i Ti, @(%)1), ifn<j<m 
0, if j>m 


The estimator of E[T] from this run is }°;.9 €j; the average of this quantity over 
many runs is the overall estimator. 


11.5 Coupling from the Past 


Consider an irreducible Markov chain with states 1,...,m and transition prob- 
abilities P,; and suppose we want to generate the value of a random variable 
whose distribution is that of the stationary distribution of this Markov chain (see 
Section 10.1 for relevant definitions). In Section 10.1 we noted that we could 
approximately generate such a random variable by arbitrarily choosing an initial 
state and then simulating the resulting Markov chain for a large fixed number 
of time periods; the final state is used as the value of the random variable. 
In this section we present a procedure that generates a random variable whose 
distribution is exactly that of the stationary distribution. 

If, in theory, we generated the Markov chain starting at time —co in any 
arbitrary state, then the state at time 0 would have the stationary distribution. So 
imagine that we do this, and suppose that a different person is to generate the 
next state at each of these times. Thus, if X(—n), the state at time —n, is i, then 
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person —n would generate a random variable that is equal to j with probability 
P;;,j=1,...,m, and the value generated would be the state at time —(n—1). 
Now suppose that person —1 wants to do his random variable generation early. 
Because he does not know what the state at time —1 will be, he generates a 
sequence of random variables N_,(i),i=1,...,m, where N_,(i), the next state 
if X(—1) =i, is equal to j with probability P,,,j=1,...,m. If it results that 
X(—1) =i, then person —1 would report that the state at time 0 is 


S_,(@i)=N_,(), i=1,...,m 


(That is, S_,(i) is the simulated state at time 0 when the simulated state at time 
. -Lisi.) 

Now suppose that person —2, hearing that person —1 is doing his simulation 
early, decides to do the same thing. She generates a sequence of random vari- 
ables N_,(i),i=1,...,m, where N_,(i) is equal to j with probability P, pl= 
1,...,m. Consequently, if it is reported to her that X(—2) = i, then she will 
report that X(—1) = N_,(i). Combining this with the early generation of person 
—1 shows that if X(—2) = i, then the simulated state at time 0 is 


S_s(i) = S_,(N_2(0), i=l,...,m 


Continuing in the preceding manner, suppose that person —3 generates a 
sequence of random variables N_;(i),i=1,...,m, where N_,(i) is to be the 
generated value of the next state when X(—3) = i. Consequently, if X(—3) =i 
then the simulated state at time 0 would be 


S_3(i) = S_,(N_,(0), i= l,. PERS i i 
Now suppose we continue the preceding, and so obtain the simulated functions 
S_,(@), S_.(@), 8_,@,... i=1,...,m 


Going backwards in time in this manner, we will at sometime, say —r, have 
a simulated function S_,(i) that is a constant function. That is, for some state 
j, S_,@) will equal j for all states i=1,...,m. But this means that no matter 
what the simulated values from time —oo to —r, we can be certain that the 
simulated value at time 0 is j. Consequently, j can be taken as the value of a 
generated random variable whose distribution is exactly that of the stationary 
distribution of the Markov chain. 


Example Ile Consider a Markov chain with states 1, 2, 3 and suppose 
that simulation yielded the values 


3, ifi=1 
N.i(@)=42, ifi=2 
2, ifi=3 


3 
i 
i 
j 
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and 
1, wi=1 
N(= 73, ifi=2 
1, ifi=3 

` Then 
3, ifi=1 
S_(i) = 2; if i=2 
3, ifi=3 

If 

3, ifi=1 
N.3,@= 41, ifi=2 
1, ifi=3 

then 
3, ifi=l1 
S3,@=%3, ifi=2 
3, ifi=3 


Therefore, no matter what the state is at time —3, the state at time 0 will be 3. o 


Remarks The procedure developed in this section for generating a random 


. variable whose distribution is the stationary distribution of the Markov chain is 


called coupling from the past. o 


Exercises 


1. Set up the alias method for generating a binomial with parameters (5, 0.4). 


2. Explain how we can number the Q™ in the alias method so that k is one of 
the two points to which Q™ gives weight. 


3. Complete the details of Example 11a. 


4. Write a program to generate the points of a two-dimensional Poisson process 
within a circle of radius R, and run the program for A = 1 and R = 5. Plot the 
points obtained. 
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5. Use simulation to estimate the probability that the bridge structure given 
in Figure 8.1 will fail if components 1, 2, and 3 all, independently, fail with 
probability 0.05, and 4 and 5 with probability 0.01. Also, compare the variance 
of your estimator with that of the raw simulation estimator. 


6. Use simulation to estimate the probability that a run of 10 consecutive heads 
occurs within the first 100 flips of a fair coin. Also, compare the variance of 
your estimator with that of the raw simulation estimator. 


7. Use simulation to estimate the probability that the pattern HTTHT occurs 
within the first 20 flips of a coin whose probability of coming up heads on flip 
. iis ((+10)/40,i=1,...,20. Assume independence. 


8. Explain how the variance of the simulation estimator in Example 11b can 
be further reduced by using antithetic variables. 


9. In Exercise 5, determine the additional variance reduction obtained by also 
using antithetic variables. 


10. Let X; be Bernoulli random variables with means A;,i=1,...,im, and 
let S = }0,a,X;, where the a; are positive constants. Let R be an arbitrary 
random variable, and let J, independent of the other variables, have mass function 
P{I = i} =a;/¥0;a;,i=1,...,m. Set A=), a,A;. 


(b) Show that E[SR] = AE[R|X, = 1]. 
(c) Show that P{S > x} = AE[I(S > x)/S|X, = 1], where 1(S > x) is 1 if 
S > x and 0 otherwise. 


11. Suppose that a set of n components, with component j independently func- 


tioning with probability p;, j =1,...,n, is available. There are m experiments 
that we want to perform. However, in order to perform experiment i, all of the 
components in the set 4,,i=1,...,mm, must function. If experiment i can be 


performed, then we earn an amount a;. With the total return being the sum of 
the returns from all m experiments, suppose that we are interested in estimating 
the probability that the total return exceeds x. Assuming that this probability is 
small, give an efficient simulation procedure for estimating it. 


12. Let Z,,Z,,... denote a sequence of independent standard normals. Let 


M = Zn + Zn-1 + Zn-2 RE 


i 7 >, ned 


and define N by 
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We want to use simulation to find E[N]. Determine the variance of the raw 
simulation estimator and then one that uses the ideas of Section 11.4. (The above 
is called a moving-average control chart and is used to determine when the 
distribution has shifted from the unit normal.) 
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data (Continued) 
sample mean and sample variance, 
117-123 
definition, 119 
method for determining when to stop 
generating new data, 121-123 
proposition, 119-120 
statistical validation techniques, 219~244 
goodness of fit tests, 219-226 
chi-square goodness of fit test for 
discrete data, 220-222 
Kolmogorov—Smirnov test for 
continuous data, 222-226 
when some parameters are unspecified, 
227-230 
two-sample problem, 230-236 
validating assumption of 
nonhomogeneous Poisson process, 
237-241 
stratified sampling, 166-175 
applications of, 175-184 
striking price, 108 
system state (SS) variable, 93, 102, 104 
inventory model, 102 
queueing system with two parallel 
servers, 100 
queueing system with two servers in 
series, 98 
in repair problem, 106 
single-server queueing system, 95 


tandem queueing system, 97 

thinning algorithm, 84-85 

time reversible Markov chain, 247 

time variable, 93, 96, 100, 102, 104 

inventory model, 102 

queueing system with two parallel 
servers, 100 

queueing system with two servers in 
series, 98 

in repair problem, 106 

single-server queueing system, 95 


trace, 111 

two-dimensional Poisson process, 
simulating, 277-280 

two-sample problem, 230-236 

two-sample rank sum test, 232 


U 
uniformly distributed random variables, 25 
union of events, 5 


V 
variables. See also random variables 
variance, 14-16 
definition, 14-15 
proof, 16 
proposition, 15 
variance reduction techniques, 137-217 
applications of stratified sampling, 
175-184 
estimating functions of random 
permutations and random subsets, 
203-207 
random permutations, 203-206 
random subsets, 206-207 
evaluating an exotic option, 198-203 
importance sampling, 184-197 
stratified sampling, 166-175 
use of antithetic variables, 139-146 
use of control variates, 147—154 
using common random numbers, 
197-198 
variance reduction by conditioning, 
154-166 
verification of antithetic variable 
approach when estimating expected 
value of monotone functions, 
207-209 
vectors, random, 61-62 
von Neumann, John, 60 


Ww 
Wilcoxon two-sample test. See two-sample 
rank sum test 
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